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ABSTRACT 


Multigrid  methods  have  been  traditionally  applied  to  the  solution  of  certain  Partial 
Differential  Equations.  However,  applications  in  control  theory,  optimization,  pattern 
recognition,  computational  tomography  and  particle  physics  are  beginning  to  appear. 

This  thesis  analyzes  the  application  of  multigrid  methodology  to  optimization 
problems.  The  work  is  centered  on  networks.  Transportation  problems  are  chosen 
frequently  as  reference  because  they  have  been  the  object  of  some  multigrid  research.  The 
goal  is  to  establish  a  basis  for  development  of  multigrid-based  algorithms. 

Optimality  conditions  in  linear  programming  and  networks  are  reviewed,  and  a 
compilation  of  various  multilevel  approaches  in  optimization  is  presented.  Emphasis  is  on 
the  recent  scaling  techniques;  they  add  some  special  insights  into  solving  large  network 
problems  efficiently  using  progressive  level  of  detail.  An  analysis  of  the  difficulties  that 
these  problems  present  to  the  multigrid  approach  reveals  that  perhaps  some  abstraction 
is  appropriate  when  interpreting  multigrid  components  applied  to  optimization  problems 
(in  particular,  the  concept  of  grid  itself).  The  idea  of  implicit  ordering  is  developed  and 
associated  with  the  effectiveness  of  the  multigrid  method.  These  concepts  are  applied  to 
identify  problems  that  can  be  solved  using  multigrid.  Finally,  suggestions  for  the 
development  of  future  multigrid-based  algorithms  are  provided. 
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EXECUTIVE  SUMMARY 


Multi-level  processes  are  found  in  many  military,  government  and  private  sector 
activities.  In  many  cases  the  multi-level  processes  are  easy  to  observe  because  they  are 
inherent  in  the  activity  itself.  For  example,  in  the  shipment  of  goods  from  factories  to 
warehouses  and  then  to  customers  the  multi-level  nature  of  the  distribution  is  obvious.  In 
other  cases  the  multi-level  processes  are  in  the  structure  or  environment  in  which  the 
activity  is  developed.  Fur  example,  military  personnel  assignment  has  multiple  levels 
because  of  way  that  jobs  are  assigned  according  to  specialties  or  ranks.  In  still  other 
cases  multilevels  are  introduced  where  none  is  inherent  in  order  to  break  the  problem  into 
parts  that  may  be  solved  more  easily  than  the  original.  Multilevel  problems  are  virtually 
all  very  large  and  it  is  usually  necessary  to  develop  solution  techniques  that  exploit  special 
structure  associated  with  the  multilevel  processes  in  order  to  effectively  construct 
solutions. 

Multigrid  methods  are  a  multilevel  approach  to  solving  problems.  The  basic  idea  is 
to  sample  problems  at  different  levels  of  detail  so  that  their  simpler  representation  on 
coarser  grids  can  reduce  computational  effort.  They  seem  to  be  particularly  effective  when 
there  is  some  inherent  hierarchical  structure  or  when  a  hierarchical  structure  can  be 
induced  on  the  problem.  In  addition  to  hierarchical  structure  there  are  other  problem 
characteristics  that  are  critical  to  the  success  of  multigrid  methods. 

Traditionally,  multigrid  methods  have  been  applied  to  the  solution  of  certain  Partial 
Differential  Equations,  but  they  are  much  more  widely  applicable  than  just  to  the  numerical 
solution  of  differential  and  integral  equations.  Applications  in  such  diverse  areas  as  control 
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The  present  thesis  analyzes  the  possibilities  of  applying  the  multigrid  methods  to 
solve  optimization  problems.  The  work  is  centered  on  network  problems,  particularly 
transportation  problems.  The  reason  for  this  choice  is  that  the  efficient  solution  of  large 
network  problems  is  important  in  many  military  and  industrial  activities,  and  also  there  has 
been  research  on  the  application  of  multigrid  methods  to  transportation  problems.  The 
thesis  does  not  actually  solve  any  network  problem.  The  goal  is  to  develop  a  basis  for 
future  development  of  algorithms  that  take  advantage  of  the  powerful  characteristics  of 
multigrid  to  solve  large  networks  and,  ultimately,  more  general  optimization  problems. 

This  thesis  provides  an  overview  of  the  basic  aspects  of  optimality  in  linear 
programming  and  networks,  and  a  compilation  of  various  multilevel  approaches  in 
optimization.  Special  emphasis  is  made  on  the  recent  scaling  techniques;  they  add  some 
special  insights  into  the  task  of  solving  large  network  problems  in  a  progressive  level  of 
detail  and  are  very  efficient  computationally.  An  analysis  of  the  difficulties  that  those 
problems  present  to  the  multigrid  approach  reveals  that  perhaps  a  high  level  of  abstraction 
is  appropriate  when  interpreting  the  application  of  multigrid  components  to  optimization 
problems.  New  ideas  to  interpret  the  concept  of  grid  are  proposed,  and  the  idea  of  implicit 
ordering  is  associated  with  the  effectiveness  of  the  multigrid  method.  Finally,  these 
concepts  are  applied  to  identify  problems  to  be  solved  using  multigrid.  The  conclusions 
provide  some  new  insights  and  give  some  suggestions  for  the  development  of  future 
multigrid-based  algorithms  oriented  to  the  solution  of  optimization  problems. 
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I.  INTRODUCTION 


A.  MOTIVATION 

Multilevel  algorithms  have  emerged  as  the  most  efficient  algorithms  for  solving 
certain  very  large  algebraic  systems  arising  from  discretizing  partial  differential  boundary 
value  problems.  Multilevel  approaches  have  naturally  emerged  in  many  branches  of 
computer  technology,  as  in  the  structured  organization  of  computer  hardware,  the  top- 
down  structured  design  of  software,  the  pyramidal  data  structures  (trees,  heaps,  etc.)  and 
many  of  the  most  efficient  algorithms  in  computer  science,  such  as  fast  sorting  and  others 
in  the  "divide  and  conquer"  class  of  algorithms.  The  multigrid  approach  adds  some  other 
characteristics  to  establish  a  powerful  and  general  framework  for  the  solution  of  a  wide 
class  of  problems.  This  research  explores  the  suitability  of  such  methods  to  the  solution 
of  optimization  problems. 


B.  BASIC  CONCEPTS 

In  the  numerical  solution  of  differential  equations,  each  variable  is  represented  by 
a  vector  of  real  values.  This  means  that  the  solution  of  the  problem  can  be  regarded  as 
an  array  of  vectors,  each  corresponding  to  an  individual  variable.  In  the  simpler  one¬ 
dimensional  case,  the  solution  of  the  problem  is  a  single  vector,  which  we  will  refer  to  as 
the  solution  vector 
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Considering  the  one-dimensionai  case  (Hackbusch,  1980),  relaxation  methods,  i.e. 
Jacobi  and  Gauss-Seidel  iterations,  begin  with  an  initial  guess  of  a  solution.  They  then 
proceed  to  improve  the  current  approximation  by  updating  steps  (iterations).  If  we  could 
represent  the  error  at  each  step,  it  would  be  a  vector  of  the  same  number  of  components 
as  the  solution  vector.  It  is  referred  to  as  the  error  vector.  At  the  beginning  of  the 
iterative  process,  the  error  vector  is  somewhat  arbitrarily  constructed.  The  arbitrariness 
is  related  to  the  choice  of  a  first  guess  taken  as  initial  solution.  Some  problems  present 
special  behaviors  of  the  error  vector  along  the  iteration  steps.  We  are  interested  in  the 
kind  of  problems  in  which  the  error  can  be  represented  by  a  linear  combination  of 
frequency  components  (i.e.  Fourier  nodes).  Moreover,  we  are  interested  in  errors  that 
through  the  iterative  process  are  "smoothed11  to  end  up  being  formed  by  only  long  waves. 
(At  this  point,  it  suffices  to  say  that  long  waves  are  those  with  relatively  low  frequency 
components).  It  is  hard  to  detect  such  structure  in  the  error  vector,  and  even  harder  to 
know  about  its  evolution.  In  fact,  to  detect  this  structure,  we  need  to  know  the  error  itself, 
and  knowing  the  error  is  equivalent  to  having  solved  the  problem.  Again,  it  is  the  evolution 
of  the  error  that  attracts  our  attention  to  these  problems,  because  it  makes  them  suitable 
for  special  solution  techniques,  known  as  multigrid  techniques. 

In  these  problems  the  error  vector,  in  general,  will  consist  of  high  frequency 
oscillatory  components,  medium  frequency  components,  and  low  frequency  smooth 
components.  Conventional  relaxation  methods,  as  those  mentioned  above,  reduce  the 
oscillatory  components  first.  When  the  remaining  error’s  spectrum  is  formed  essentially 
by  smooth  components  a  progressive  slowness  in  the  convergence  to  the  solution  is 
observed.  At  this  point,  the  iterative  processes  become  extremely  slow. 
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The  multigrid  approach  is  to  treat  each  component  of  the  error  on  the  environment 
(grid)  where  that  component  appears  to  be  "oscillatory*. 

The  multigrid  approach  takes  advantage  of  the  following  facts: 

Smooth  components  of  the  error  are  well  represented  on  coarser  grids.  This 
facilitates  solving  the  problem  on  a  "simpler"  domain; 

When  transferring  smooth  components  into  the  next  domain  (coarser  grid), 
moderately-smooth  components  of  the  error  become  oscillatory  components 
of  the  error  in  the  new  domain.  This  fact  speeds  up  the  iterative  process  since 
oscillatory  components  are  reduced  faster. 

The  complementary  use  of  these  two  ideas  constitutes  the  basis  of  multigrid 
method.  Both  of  them  contribute  to  accelerate  convergence  and,  with  the  help  of  some 
intuition,  explain  the  success  of  that  approach. 

Two  main  tools  are  used  in  approaching  the  solution  of  problems  in  multigrid 
(Briggs.  1987): 

(a)  Use  of  coarser  grids  to  obtain  initial  guesses  for  the  problem  on  finer  grids 
(nested  deration)-, 

(b)  Use  of  approximate  solution  in  fine  grids  to  obtain  a  residual,  defined  as  the 
difference  between  the  true  and  approximated  right  hand  side  vectors.  The 
idea  is  to  relax  until  the  error  is  smooth,  then  transfer  the  residual  equation  to 
coarser  grids  to  relax  on  the  error,  and  finally  correct  the  approximate  solution 
in  the  finer  grid  after  interpolating  the  error,  (coarse  grid  correction). 
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C.  INTERPRETATION  OF  THE  IDEA  IN  OPTIMIZATION  PROBLEMS 

The  above  ideas  suggest  that  a  parallel  approach  could  be  implemented  in  solving 
optimization  problems.  The  bask:  ideas  should  be: 

(a)  Nested  Iteration.-  Form  a  reduced  problem  at  the  simplest  possible  level. 
Solve,  and  use  the  solution  to  obtain  a  good  initial  guess  to  the  extended 
problem  at  the  next  level.  Proceed  in  this  fashion  until  the  finest  level  is 
reached. 

(b)  Coarse  Grid  Correction.-  Find  an  initial  guess  to  the  problem,  that  we  will  call 
the  "complete  problem".  Work  on  this  guess  to  obtain  an  improvement.  Iterate 
until  a  point  of  little  advance  is  reached.  Then  shift  to  a  "reduced  problem", 
derived  from  the  complete  problem.  Solve  the  reduced  problem  using  a  similar 
(nested)  approach.  Use  this  result  to  improve  the  current  solution  to  the 
complete  problem. 

D.  OBJECTIVE  AND  CONTENTS  OF  THE  THESIS 

One  goal  of  this  thesis  is  to  investigate  the  advances  made  so  far  in  this  field,  in 
particular,  the  application  of  the  multigrid  philosophy  to  the  solution  of  long  transportation 
problems.  Previous  work  shows  some  success  in  obtaining  approximate  solutions  to  the 
problem,  but  on  large  problems  the  results  are  not  very  promising.  This  research  steps 
back  to  a  more  general  and  abstract  level  in  an  attempt  to  find  characteristics  helping  to 
define  problems  amenable  to  multigrid  approach.  Also,  some  of  the  essential  concepts 
characterizing  multigrid  are  generalized  to  a  higher  level  of  abstraction  in  order  to  seek 
new  interpretations  for  the  multigrid  techniques  when  considered  in  the  special  context  of 
optimization. 
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It  is  an  objective  of  this  thesis  to  gather  what  we  could  call  a  set  of  fundamental 
"building  blocks*.  Multigrid  is  not  a  developed  technique  in  optimization.  Nested  iteration 
and  coarse  grid  correction  require  special  handling  in  an  optimization  environment.  We 
wish  to  review  some  of  those  optimization  techniques  already  available  that  can  be 
exploited,  particularly  those  likely  to  be  more  adaptable  to  a  multigrid  use.  Multilevel 
methods  are  a  logical  choice,  so  we  examine  and  include  them  in  the  multigrid  context. 

An  important  feature  of  real-life  network  problems,  including  transportation  problems, 
is  the  arc  capacity.  In  the  multigrid  context,  capacity  has  a  special  importance,  since  it 
tends  to  make  the  behavior  of  networks  less  smooth  in  response  to  iterative  processes, 
causing  multigrid  implementation  to  be  more  problematic.  We  want  to  pay  special  attention 
to  arc  capacity  and,  whenever  possible,  include  it  as  a  factor  in  the  problem  design.  This 
is  an  aspect  that  has  not  been  considered  before. 

Finally,  this  work  will  try  to  open  the  interpretation  of  the  multigrid  idea  in 
optimization  to  possible  ways  of  restriction,  other  than  the  "traditional"  node  aggregation. 
This  involves  an  abstraction  of  the  concept  of  "grid".  Also,  possible  ways  to  improve  the 
relaxation  process  are  suggested  using  the  mentioned  building  blocks. 

Chapter  II  is  a  description  of  the  transportation  problem.  This  type  of  problem  has 
been  historically  selected  to  investigate  new  algorithms.  This  is  mainly  due  to  its  simplicity 
in  structure.  In  this  chapter  we  present  the  transportation  problem  as  a  member  of  the 
class  of  minimum  cost  problems.  Duality,  a  key  concept  in  optimization  theory  that  will  be 
found  along  the  pages  to  follow,  is  introduced.  Specifically,  we  study  the  meaning  of  the 
dual  associated  to  the  transportation  problem. 
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Although  the  long  transportation  problem  is  our  selected  problem  type,  a  general 
approach  to  the  solution  of  optimization  (network)  problems  is  kept  in  mind  at  all  time. 
Occasionally,  we  will  be  willing  to  beat  a  more  general  type  of  problem. 

The  optimality  conditions  in  networks  are  the  basic  set  of  tools  which  are  constantly 
invoked  in  this  work.  It  is  natural  then  to  devote  some  space  to  set  up  these  conditions 
in  networks.  That  is  done  in  Chapter  III,  where  it  is  preceded  by  a  short  summary  of  the 
more  general  linear  programming  optimality  conditions. 

Chapters  IV,  V,  and  VI  describe  the  above  mentioned  building  blocks.  These 
chapters  describe  with  some  detail  the  decomposition  methodology,  the 
aggregation/disaggregation  method,  and  the  more  recent  scaling  techniques,  respectively. 
We  describe  them  and  give  some  examples.  This  is  especially  detailed  in  the  chapter 
devoted  to  scaling.  This  last  technique,  dated  1972  (Edmonds  and  Karp),  has  been  more 
extensively  researched  in  recent  years  (Bertsekas,  1979;  Bland  and  Jensen,  1992,  Ahuja 
etal,  1993),  and  appears  to  be  promising. 

Chapter  VII  studies  the  perturbation  method,  it  is  an  application  of  the  more  general 
perturbation  philosophy  to  solving  linear  programming  problems. 

In  Chapter  VIII  some  further  work  in  the  multigrid  concept  is  described.  A  certain 
familiarity  with  the  method  is  assumed  on  the  part  of  the  reader.  This  work  adds  some 
insights  to  the  comprehension  of  the  multigrid  methodology. 

Chapter  IX  develops  some  ideas  to  identify  the  type  of  optimization  problems  that 
could  be  expected  to  be  solved  using  multigrid.  A  short  overview  of  previous  work  on  the 
topic  is  presented,  with  conclusions  about  critical  aspects  of  the  implementation  of 
multigrid  algorithms  in  solving  optimization  problems,  and  suggestions  for  further  research. 
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II.  OPTIMALITY  CONDITIONS 


A.  INTRODUCTION 

In  this  chapter  we  present  the  Karush-Kuhn-Tucker  (KKT)  optimality  conditions. 
These  are  necessary  and  sufficient  for  linear  programming  problems.  The  case  of  equality 
constraints  is  specifically  considered.  A  key  result  in  establishing  the  KKT  conditions  is  the 
lemma  presented  in  section  B.  Finally,  in  section  D  we  specialize  the  optimality  conditions 
to  the  case  of  networks. 

A  convex  cone  generated  by  the  vectors  v(1>,  v*3> . v<p)  is  (Owen,  1982)  the  set 

of  all  vectors  v  such  that 

v  =  y  xkv <k) 

for  some  nonnegative  Xv  A*, 

NOTE  (Special  notation):  In  this  chapter  we  will  use  dot-notation  to  refer  to  sets  defined 
using  two  indices,  in  the  following  way:  to  group  elements  indexed  (i,j),  with  a  constant 
value  of  i,  and  all  possible  values  for  j  we  use  the  special  notation  V  (the  dot  is  part  of 
the  notation).  So,  the  ith  row  of  a  matrix  A  is  denoted  A,. ,  while  the  jth  column  of 
A  is  denoted  as  A  {  . 
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B.  LEMMA  (Farias’  Lsmms) : 

Ons  and  only  one  of  the  following  two  systems  has  a  solution: 

System  1:  w  A  =  c,  w  £  0 

System  2:  Ax  >  0,  cx  <  0  (11.1) 

where  A  is  a  given  m  x  n  matrix  and  c  is  a  given  n-vector  (a  row  vector). 

(NOTE:  There  are  other  versions  of  the  lemma;  this  is  taken  from  Bazaraa,  Jarvis  and 
Sheraii,  1990).  We  do  not  include  a  proof,  but  a  geometric  interpretation. 

Geometric  Interpretation:  System  1  has  a  solution  if  and  only  if  c  belongs  to  the  convex 
cone  generated  by  tee  rows  of  A.  On  the  other  side,  system  2  has  a  solution  if  and  only 
if  tee  vector  c  does  not  belong  to  the  cone  generated  by  tee  rows  of  A. 

C.  KARUSH-KUHN-TUCKER  (KKT)  OPTIMALITY  CONDITIONS 

We  take  this  version  of  tee  KKT  optimality  conditions  from  Bazaraa,  Jarvis  and 
Sheraii,  1990.  Some  definitions  are  required. 

A  nonempty  set  S  in  9T  is  said  to  be  convex  if  the  line  segment  joining  any  two 
points  in  the  set  also  belongs  to  the  set.  This  is  expressed  as 

X  x  +  (1  -  X)  y  e  S,  for  all  0  <,  X  £  1  and  all  x,  y  e  S 
A  real-valued  function  f  defined  on  such  set  S  is  said  to  be  a  convex 
function  if  for  every  pair  x,  y  e  S  and  for  every  real  number  X  contained  in  the 
open  interval  (0,1),  the  following  holds 

f(Xx  +  (1  -  X)  y)  <  Xf(x)  +  <1  -  X)  f(y) 
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Let  x  be  a  member  of  a  convex  set  S.  A  nonzero  vector  d  is  called  a  feasible 
direction  at  x  if  there  exists  a  5  >  0,  such  that  for  all  k  in  the  open  interval  (0,  8),  the 
point  x  +  k  d  is  in  the  set  S. 

A  nonzero  vector  d  is  called  an  Improving  feasible  direction  at  x  if  there 
exists  a  S  >  0,  such  that  for  all  k  in  the  open  interval  (0,  5): 

(i)  x  +  k  d  is  a  member  of  the  set  S; 

(ii)  c  (x  +  k  d  )  <  c  x  (for  minimization  problems,  a  descent  direction). 

Note  that  in  linear  programming  problems,  the  cost  vector  c  represents  the  gradient 

of  the  objective  function.  Condition  (ii)  in  this  case,  is  equivalent  to  c  d  <  0. 

Definition:  We  say  that  a  constraint  is  binding  at  x  if  it  is  satisfied  with  equality 
at  that  point. 

Now  consider  the  linear  programming  problem. 

min  c  x 

s.t.  A  x  >  b 

x  >  0 

For  the  above  linear  programming  problem  let  x  be  a  feasible  solution  and  G  the 
submatrix  of  A  formed  by  the  binding  constraints  at  x.  The  rows  of  G  can  be  one  of  the 
following  two  types: 

(a)  Rows  of  A,  of  the  form  A,  such  that  ALx  =  b,  ;  the  index  set  for  these 
constraints  will  be  denoted  as  / ; 
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(b)  Rows  formed  by  zeros,  except  for  a  *1’  in  the  jth  position.  The  index  set  for 
these  will  be  called  J.  These  constraints  belong  to  the  group  of  nonnegativity 


constraints  in  foe  linear  programming  problem. 

In  other  words, 

I  -  {  i  :  A,  x  *  b,  ) 

J  -  { j  :  x,  =  0  } 

Now  suppose  that  x  is  an  optimal  solution.  Then  there  cannot  be  at  x  any 
improving  feasible  direction  d.  Thus,  (i)  and  (ii)  cannot  be  satisfied  simultaneously  for  any 
direction  d.  So,  it  cannot  be  that  cd  <  0  and  G  (x  +  X  d)  >  g. 

Let  g  be  the  vector  defined  by  foe  binding  constraints  G  x  =  g.  The  second 
inequality  above  can  be  expanded  as 

Gx  +  X  G  d  >  g,  which  is  equivalent  to  G  d  >  0. 

Since  there  cannot  be  a  direction  d  such  that  cd  <  0  and  G  d  >  0  hold 
simultaneously,  we  can  make  use  of  Farkas’  Lemma  to  state  that  there  exists  a  vector 
u  >  0  such  that  uG  =  c. 

Let  us  write  such  vector  as  u  =  (a,  (3),  where 

a  =  (u,  :  i  e  I) 
p  =  (u, :  j  e  J) 

then  the  conditions  u  £  0  and  uG  =  c  can  be  rewritten  as 

E«.  A.  +  EM  -  c  (ii.2) 

lei  JeJ 
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with 


a*  *  0,  iel  and  p3  *  o,  jeJ  (11.3) 

These  are  the  KKT  conditions  for  optimality.  The  interpretation  is:  x  is  an  optimal 
solution  to  the  linear  program  above  if  and  only  if  the  objective  gradient  c  lies  in  the 
convex  cone  generated  by  the  gradients  of  the  binding  constraints  at  x  . 

These  conditions  are  usually  written  in  the  following  form.  Define  the  vectors 

a  «  (a1,a2,...,om)  *  0 

P  *  (P,.P2 Pn)  *  0 

their  components  having  value  zero  for  those  indices  not  belonging  to  I,  J,  respectively 
(recall  that  the  lengths,  m  and  n,  are  precisely  the  number  of  constraints  and  the  number 
of  variables  of  the  problem).  Those  are  indices  corresponding  to  the  nonbinding 
constraints.  Then  the  KKT  conditions  hold  for  vectors  (x,  a,  (3)  if  there  exists  a  solution  to 
the  system 


Ax  i  b, 

X  2  0 

(11.4) 

«A  +  p  =  c, 

12  0,  0  2  0 

(H.5) 

a  (Ax  -  b)  =  0, 

Px  =  0 

(11-6) 

The  first  condition  (primal  feasibility)  states  that  the  candidate  point  must  be 
feasible.  Condition  (11.5)  is  the  dual  feasibility.  Here  a  and  [3  are  the  Lagrangian 
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multiplier*  or  duel  variables.  Finally,  the  last  condition  is  the  complementary  slackness 
condition. 

The  complementary  slackness  condition  is  the  expression  of  the  so  called 
Complementary  Slackness  Theorem,  stating  that  if  a  pair  of  primal  and  dual  solutions 
are  feasible  and  satisfy  the  complementary  slackness  condition,  they  are  optimal. 

An  immediate  consequence  of  the  above  is  the  following: 

Corollary:  The  following  two  statements  are  equivalent 

(i)  x  is  an  optimal  solution  for  the  primal  problem  and  Q  is  an  optimal  solution 
for  the  dual  problem; 

(ii)  (x,  Q),  is  a  feasible  solution  for  the  pair  of  problems,  primal  and  dual. 

In  the  case  of  equality  constraints,  which  concerns  the  transportation  problem,  we 

have: 


•  min  c  x 
s.t.  A  x  =  b 

x  >  0 

We  can  transform  the  equality  into  a  logical  combination  of  two  inequalities, 

(Ax  =  b)  *  (Ax  >  b)  and  (-  A  x  >  -  b)  (11.7) 
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As  a  result,  the  values  of  a  are  unrestricted  at  the  points  satisfying  the  KKT 


conditions,  and  these  are  expressed  in  the  form: 


A  x 

=  b. 

x  2  0 

(11.8) 

a  A  +  P  =  c, 

a  unrestricted,  P  >  0 

(11.9) 

Px 

=  0 

(11.10) 

Suppose  x  is  an  optimal  solution.  Then,  in  order  to  satisfy  the  complementary 
slackness  condition  (11.9),  it  must  be  that 

(Pb  >  Pn)  (*b  •  Xn)  =  Pb  *b  +  Pn  *n  =  ®  (11.1  1 ) 

where  the  vectors  p  and  x  have  been  expressed  in  blocks  of  components  associated 
to  the  basic  and  nonbasic  variables  (subscripts  B,  N  respectively). 

Now,  since  xN  =  0  ,  it  must  be  that  PB  xB  =  0,  and  (11.11)  holds  when  pB  =  0. 
Then  (11.8)  can  be  exoressed: 

(CB.  cN)  -  a(B,  N)  -  (P8,  pN)  =  (0.  0)  (H.12) 

equivalent  to 

(cB  -  aB  =  0)  and  (cN  -  on  -  PN  =  o)  (11-13) 

where  N  is  the  submatrix  of  A  formed  by  the  rows  associated  to  the  nonbasic  variables. 
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D.  OPTIMALITY  CONDITIONS  IN  NETWORKS 

The  theorems,  properties  and  corollaries  in  this  section  can  be  found  in  Ahuja  etal, 
1993.  Suppose  N  is  a  network.  A  set  of  node  potentials  is  a  mapping  n:  N  ->  K.  Let 

n  =  (n(1),  n(2) . 7i(n)}  be  a  set  of  node  potentials.  For  compactness,  sometimes  we  will 

treat  n  as  a  vector.  We  define  the  reduced  coet  of  an  arc  (i,j)  as 

Cf  *  C|  -  7T(I)  +  75  0)  (11.14) 

We  use  the  superscript  to  indicate  that  the  reduced  cost  is  associated  with  the  set 
n.  Note  that  when  n  =  0  the  reduced  cost  vector  is  identical  to  the  cost  vector.  (Also 
for  compactness,  we  refer  to  both  cost  sets  in  terms  of  vectors,  which  we  will  do 
sometimes  to  simplify  the  notation). 

Lemma:  Suppose  we  want  to  solve  a  minimum  cost  flow  problem  using  as  set  of  costs 
the  reduced  costs  associated  with  a  set  of  node  potentials  7t.  Let  z(tc)  denote  the  value 
of  the  objective  func*!on  value  of  this  problem,  and  z(0)  the  objective  value  considering 
the  usual  costs.  The  flow  values  are  the  same  in  both  cases.  Then  we  have 

52  °ii  '  52  ~  ^  *0Wi) 

IJ  IJ  UN 

or 

z(ti)  «  z(0)  -  ltd 

where  d  is  the  vector  of  demands  and  n  the  vector  of  node-potentials. 

Proof.  Set  all  node  potentials  to  zero.  Increasing  the  potential  of  node  k  from  0  to  n(k) 
modifies  the  reduced  costs  as  follows: 


11.15) 

(11.16) 
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-  ail  incoming  arcs  to  node  k  increase  reduced  costs  by  it(k); 

•  all  arcs  leaving  k  decrease  reduced  costs  by  n( k). 

This  causes  the  value  of  the  objective  function  to  decrease  to: 

zog  =  z(0)  +  w(k)£x*  -  n(k)£x* 

i  i 

=  z(0)  -  n(k)d(k)  (H.17) 

Proceeding  as  above  for  all  the  nodes  yields  (11.16) 

Corollary  (lt-1):  Optimal  minimum  cost  flow  problems  with  arc  costs  c„  or  have  the 
same  flow  distributions. 

Corollary  (11-2):  For  any  directed  cycle  C  and  for  any  set  of  node  potentials  it: 

Ecu*  -  Ecu  (».i8) 

c  c 

Corollary  (11-3):  For  any  directed  path  P  from  k  to  I : 

E°i"  -  £e,  -  *(k)  .  *(l)  (H-19) 

Let  u  represent  the  vector  of  arcs  capacities.  We  define  the  residual  network 
with  respect  to  a  given  flow  x  as  follows:  replace  each  arc  (i,j)  in  the  original  network  by 
two  arcs  (i.j)  and  (j,i): 

(i)  the  arc  (i.j)  has  cost  cy  and  residual  capacity  ri5  =  u4j  -  xi;j  and 

(ii)  the  arc  (j><)  has  cost  -c±j  and  residual  capacity  r  j±  «  x  . 

The  residual  network  consists  of  only  the  arcs  with  a  positive  residual  capacity. 
The  following  theorems  (Ahuja  etal,  1993)  state  the  optimality  conditions  in  network 
problems  (for  brevity,  the  proofs  have  not  been  included  here). 
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Theorem  11-1  (Negative  Cycle  Optimality  Conditional:  A  feasible  solution  x  is  an 
optimal  solution  of  the  minimum  cost  problem  if  and  only  if  the  residual  network  G(x) 
contains  no  negative  cost  (directed)  cycle. 

Theorem  U-2  (Reduced  Coat  Optimality  Conditional:  A  feasible  solution  x  is  an 
optimal  solution  of  the  minimum  cost  problem  if  and  only  if  some  set  of  node  potentials 

n  satisfy  c £  2  0  for  every  arc  (i,j)  in  the  residual  network  G(x). 

Theorem  11-3  (Complementary  Slackneaa  Optimality  Conditional:  A  feasible  solution 
x  is  an  optimal  solution  of  the  minimum  cost  problem  if  and  only  if  for  some  set  of  node 
potentials  n  ,  the  reduced  costs  and  flow  values  satisfy  the  following  for  every  arc  (i,j)  in 
the  network: 


(0 

If 

Cjj  >  0, 

then 

j<l 

11 

0 

(11.20) 

(ii) 

If 

0 

A 

JX| 

<  u, 

then  c,"  =  0 

(H.21) 

(iii) 

If 

c  *  =  0, 

''ll  w» 

then 

**  =  ut) 

(11.22) 

The  above  sections  provide  a  method  for  checking  for  optimality  of  solutions  in 
optimization  problems,  and  its  specialization  to  network  flows.  This  will  be  useful  when 
revising  particular  techniques  in  the  chapters  to  follow. 
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III.  THE  TRANSPORTATION  PROBLEM  IN  THE  CONTEXT  OF  MINIMUM  COST 
PROBLEMS 


A.  INTRODUCTION 

The  transportation  problem  was  first  introduced  by  Hitchcock  in  1 941 ,  and  since  then 
a  wide  variety  of  seemingly  different  problems  have  been  shown  to  be  equivalent  to  the 
transportation  problem.  Historically,  the  transportation  model  has  served  as  "laboratory 
animal’  in  the  development  of  optimization  technology  (Dantzig,  1963).  No  matter  how 
large  the  problem  at  hand,  it  possesses  a  simple,  exploitable  structure. 

The  basic  structure  is  the  following:  a  company  owns  m  warehouses  (origins,  or 
supply  nodes),  in  each  of  which  there  is  a  given  amount  of  a  certain  commodity  in  stock. 
Also,  there  are  n  retailers  ( destinations  or  demand  nodes),  each  with  a  given  demand 
for  this  commodity.  The  unit  transportation  cost  from  each  warehouse  to  every  retailer  is 
known  data.  The  objective  is  to  ship  units  from  supply  nodes  to  demand  nodes,  such 
that: 

(i)  the  total  shipping  cost  is  minimized; 

(ii)  the  demands  of  retailers  are  satisfied; 

(iii)  no  more  units  leave  a  warehouse  than  there  are  in  stock; 

If  the  total  supply  equals  the  total  demand  we  have  a  balanced  transportation 
problem.  Actually,  we  can  assume  that  all  transportation  problems  can  be  formulated  as 
balanced,  adding  dummy  supply  (demand)  nodes  to  provide  (absorb)  the  needed  (extra) 
supply  (demand). 
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When  the  number  of  supply  nodes  is  small  compared  with  the  number  of  demand 
nodes  we  have  a  long  transportation  problem. 

When  there  are  limits  imposed  on  the  capacity  of  the  supply  from  the  different 
origins  to  the  different  destinations  we  have  a  capacltatad  transportation  problem.  This 
is  a  more  interesting  problem,  for  practical  purposes.  In  this  work  we  will  focus  on  the 
simplest  of  the  models,  defined  above,  with  many  more  demand  nodes  than  supply  nodes 
(long  version). 

The  formulation  of  a  transportation  problem  involving ’m'  origins  and  'n*  destinations 

is: 


min  EE  (Ill-la) 

1-1  j-i 

s.t  E11!  5  si.  1=1 .2 . m  (Hl-1b) 

J-i 

E  x,  =  d,.  1-1.2 . n  (line) 

i-t 

x,  i  0,  i=1f...,m;  J— 1, ,n  (III. id) 


where 

ci3  is  the  cost  per  unit  for  shipping  from  i  to  j; 

is  the  number  of  units  shipped  from  i  to  j; 
sx  is  the  available  supply  at  node  i; 
dj  is  the  demand  required  by  node  j. 
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The  above  model  has  m-  n  decision  variables  x, .  The  set  of  constraints  defined 
in  (lll.lb-c)  is  written  Ax  »  b  in  compact  form,  and  is  composed  of  m  +  n 
constraints. 

The  matrix  A  has  one  row  for  each  node,  and  one  column  for  each  arc  (i,j)  joining 
supply-demand  nodes.  Furthermore,  each  column  contains  two  non-zero  coefficients,  and 
they  are  precisely  -<-1  and  -1  (there  is  another  version,  called  "Generalized 
Transportation  Problem",  which  allows  for  coefficients  other  than  zero  and  one  in  the 
demand  constraints).  The  rank  of  A  is  m+n-1.  The  matrix  A  is  totally  unlmodular,  i.e. 
the  determinant  of  every  square  submatrix  formed  from  it  has  value  -1 , 0,  or  1 .  This  fact 
allows  for  integer  optimal  solutions  when  both  supplies  and  demands  are  integer  values 
(Bazaraa  eta!...,  1990) 

B.  TRANSPORTATION  TABLEAU 

A  convenient  form  of  representing  the  transportation  model  is  by  means  of  a  tableau 

where  the  rows  1,  2,  ....  m  represent  origins,  and  the  columns  1,  2 . n  represent 

destinations.  Each  cell  (i,j)  contains  the  flow  variable  x,  and/or  the  cost  c,  (Figure  1). 
This  representation  is  commonly  used  for  describing  particular  strategies  of  solution  to  the 
transportation  problem.  This  will  be  seen  particularly  in  Chapter  V.  Here  it  is  presented 
for  compactness. 

C.  GRAPH  AND  NETWORK  REPRESENTATION 

The  underlying  structure  in  a  transportation  problem  is  that  of  a  directed  network. 
More  specifically,  it  is  a  complete  bipartite  graph,  i.e.  the  set  of  nodes  is  partitioned  into 
two  subsets:  origins  and  destinations;  every  origin  is  connected  to  every  destination  (and 
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Figure  Transportation  Tableau 


vice-versa);  finally,  all  the  arcs  defining  the  problem  go  from  a  supply  node  to  a  demand 
node.  In  graph  terminology,  every  origin  is  adjacent  to  all  destinations  and  no  destination 
is  adjacent  to  any  origin.  (Note  that  a  pair  of  supply-demand  nodes  not  adjacent  to  each 
other  are  joined  by  an  arc  highly  penalized  in  cost).  The  schematic  representation  is  in 
Figure  2. 

In  order  to  complete  the  network  representation  of  the  graph  we  need  to  add  an 
additional  node.  The  resulting  network  has  ’m+n+1’  nodes.  This  translates  into  the  linear 
programming  model  as  a  new  artificial  variable.  We  do  not  need  to  put  costs  in  the 
originated  arcs,  since  the  artificial  variable  value  must  be  zero  in  any  feasible  solution 
(Bazaraa,  1990).  The  additional  node  plays  the  role  of  a  root  node. 
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A  basis  is  now  represented  by  a  rooted  spanning  tree  (i.e.  a  spanning  tree  plus  an 
extra  arc,  root  arc,  corresponding  to  the  artificial  variable).  Figure  3  represents  the 
associated  network  with  the  transportation  problem,  and  a  basic  feasible  solution. 

Since  every  destination  must  have  its  demand  satisfied,  the  solutions  to  a 
transportation  problem  have  at  least  one  positive  flow  arriving  to  each  demand  node.  This 
is  of  special  interest  in  long  transportation  models.  In  those  models,  at  optimality  the  total 
number  of  positive  flows  is  at  most  m+n-1 .  If  so,  then  we  have  that  at  most  (m+n-1)  - 
n  =  m  -  1  demand  nodes  will  be  supplied  by  two  or  more  supply  nodes  while  the 
remaining  at  least  n  -  m  +  1  demand  nodes  will  be  supplied  by  only  one  supply  node. 
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Figure  3.- 


Transportation  Problem.  Network  Representation  and  Basic 
Feasible  Solution. 


D.  DUAL  PROBLEM 

It  is  interesting  to  formulate  the  dual  of  the  transportation  problem.  Recall  that  in 
dual  problems  (Bazaraa  et  at,  1990),  there  is  a  dual  variable  corresponding  to  each 
constraint.  In  the  present  case,  we  establish  two  groups  of  dual  variables:  those 
associated  with  constraints  (III.  1b)  are  designated  u„  and  those  associated  with 
constraints  (I II.  1c)  are  designated  v, .  The  dual  problem  is  stated 

m  n 

£  siui  *  £ 

i-i  j.i 

s.t  U,  ♦  V,  ♦  c,*  =  c, 


u, ,  v,  unrestricted 


(III. 2a) 
(lll.2b) 
(111.2c) 


The  slack  variables  are  introduced  in  (lll.2b)  to  obtain  equality. 

The  primal-dual  relationship  can  be  explained  as  follows:  suppose  that  a  firm  Alpha 
has  to  transport  goods  as  defined  by  the  primal  problem.  The  firm  is  planning  to  hire  a 
smaller  firm,  Beta,  to  do  the  job  instead.  The  hiring  conditions  must  be  such  that  the  costs 
claimed  by  Beta  to  Alpha  can  never  be  bigger  than  those  defined  by  the  primal  setting 
(otherwise,  Alpha  would  be  willing  to  do  the  job).  So  Alpha  puts  up  the  following  proposal: 
Beta  will  receive  u,  dollars  for  each  unit  shipped  out  of  the  i-th  origin.  Beta  also  will 
receive  v(  dollars  for  each  unit  delivered  to  the  j-th  destination  (l!!.2a).  Constraints  (lll.2b) 
state  that  the  shipment  of  a  unit  from  i  to  j  cannot  be  greater  than  the  shipment  cost  of  a 
unit  in  the  primal  problem  (Weak  Duality  Property).  Within  this  setting.  Alpha  sets  Beta 
free  to  arrange  the  transportation  policy,  And  Beta  will  try  to  maximize  profits.  In  doing  so, 
the  best  profit  (optimal  objective  function  value)  that  Beta  can  produce  is  exactly  the  least 
expensive  (optimal)  solution  that  Alpha  could  come  to  by  itself  ( Strong  Duality  Property). 

Suppose  that  we  have  a  feasible  solution  x.  Since  (any)  one  of  the  primal 
constraints  is  redundant,  (any)  one  of  the  dual  variables  can  assume  an  arbitrary  value. 

One  way  to  check  for  optimality  of  x  is  by  checking  whether  the  primal  feasibility  and 
dual  feasibility  are  accomplished.  This  is  known  as  the  superviaor's  principle.  That  is, 
if  u,  v  are  the  dual  vectors  (variables)  associated  with  x,  the  fact  that  u,  v  are  feasible 
for  the  dual  problem,  automatically  implies  that  x  is  an  optimal  solution  for  the  primal 
problem. 

E.  OPTIMALITY 

In  the  transportation  problem  the  rank  of  the  matrix  A  is  m+n-1 .  To  apply  the  simplex 
method  we  need  full  rank.  We  solve  that  with  an  artificial  variable  (Bazaraa,  1990).  Then, 
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any  basis  will  consist  of  m+n-1  variables,  plus  an  artificial  variable.  In  feasible  solutions 
tiie  value  of  this  artificial  variable  is  zero. 

Recall  the  primal  (lll.1a*d)  and  the  dual  (lll-2a-c)  of  the  transportation  problem. 
Recall  also  that  the  Complementary  Slackness  Theorem  states  that  at  optimality,  if  a 
variable  is  positive,  then  the  corresponding  constraint  in  the  other  problem  must  be  tight. 
And  if  a  constraint  in  one  problem  is  not  tight,  then  the  corresponding  variable  in  the  other 
problem  must  be  zero. 

The  process  of  finding  an  optimal  solution  should  follow  the  next  general  steps: 

(i)  Start  with  a  feasible  solution  x; 

(ii)  Find  the  duals  (w,  v)  associated  with  x; 

(iii)  Check  if  the  dual  variables  are  a  feasible  solution  to  the  dual  problem,  or  price 
out  the  nonbasic  variables  and  determine  if  they  price  out  unfavourably  (i.e. 
they  are  not  eligible  to  enter  the  basis). 

In  the  described  process,  the  duals  can  be  computed  as  solution  to  the  system: 

[w  v)  B  =  CaT  (M-3) 

and  the  pricing  operation  of  the  nonbasic  variable  ’k’  is  performed  by  computing: 

Cu  -  [w  v]  Nj,  (111-4) 

the  special  structure  of  the  problem  makes  (111.4)  easy  to  compute,  since 

c,  -  lw  v]  Nk  =  c,  -  (w,  -  vp  (HI-5) 

The  previous  facts  are  easily  followed  using  the  network  structure  of  the  problem. 
Algorithms  like  the  simplex  network  and  GNET  use  them  very  efficiently.  The  simplex 
method  maintains  a  basic  feasible  solution  at  every  step.  Given  a  basic  feasible  solution, 
the  method  first  applies  tire  optimality  criteria  to  test  the  optimality  of  the  current  solution. 
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If  the  current  solution  does  not  fulfill  this  condition,  the  algorithm  performs  an  operation, 
known  as  pivoting,  to  obtain  another  basis  structure  with  a  lower(minimization)  or  identical 
cost.  The  simplex  method  repeats  this  process  until  the  current  basic  feasible  solution 
satisfies  the  optimality  criteria.  GNET  (Bradley  etal,  1 977)  is  a  network  simplex  algorithm 
that  uses  the  idea  of  basic  trees  described  previously  in  this  chapter,  together  with 
extremely  efficient  data  structures  to  represent  the  necessary  information  about  the 
problem.  For  a  description  of  network  simplex  algorithms,  see  Ahuja,  etal ,  1993. 

In  the  next  three  chapters,  we  consider  some  of  the  multilevel  methods  existing  in 
the  optimization  literature. 
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IV.  THE  DECOMPOSITION  METHOD 


A.  INTRODUCTION 

"Decomposition11  is  a  term  that  embraces  three  equivalent  procedures  for  dealing 
with  large  linear  programming  problems.  The  procedures  are  the  Dantzig-Wolfe 
Decomposition,  Benders’  Decomposition  and  the  Lagrangian  Relaxation.  The  usual 
environment  is  that  of  a  problem  having  a  special  set  of  constraints  (hopefully  very  easy 
to  handle),  together  with  another  set  of  bundle  constraints  that  make  the  problem  harder. 

The  three  methods  involve  systematic  computation  at  two  levels.  One  level  is  called 
The  Master  Problem.  The  other  is  called  The  Subproblem.  The  subproblem  works  on 
the  easy  set  of  constraints.  Normally,  the  subproblem  is  solved  quickly.  But  its  solution 
does  not  necesarily  satisfy  the  bundle  (harder)  constraints.  The  solution  of  the  subproblem 
provides  information  via  feedback  to  the  master  problem,  specifically  in  the  problem 
parameters.  The  modified  master  problem  originates  a  new  subproblem.  The  control  is 
passed  again  to  the  subproblem  level,  where  a  new  solution  is  produced.  The  process 
continues  until  it  converges  and  an  optimal  solution  is  found  satisfying  the  bundle 
constraints.  The  following  sections  will  cover  an  overall  description  of  the  various 
techniques.  The  goal  is  to  broaden  the  spectrum  of  multilevel  techniques  that  could  be 
used  to  produce  multigrid  approaches  to  optimization  problems. 

In  Appendix  A,  a  numerical  example  is  provided  showing  a  practical  application  of 
the  Dantzig-Wolfe  decomposition  scheme. 
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B.  DANTZKa-WOLFE  DECOMPOSITION 

Consider  the  following  linear  program: 


min  cx 

(IV.  la) 

s.t  Ax  *  b 

(IV.Ib) 

X  €  X 

(IV.  1c) 

Here,  (IV.  1c)  is  the  set  defined  by  special  constraints;  (IV.Ib)  represents  the  bundle 
constraints.  If  X  is  bounded  and  polyhedral,  then  any  point  in  X  can  be  expressed  as 


x  -  E  = 

j-i 

t 

with  i 

]-i 

and  A.j  i  0;  j  =  1,2 . t 

where  x(j)  are  the  t  extreme  (corner)  points  of  the  set  X. 

Substituting  in  (IV.1),  the  original  problem  is  now  expressed  in  terms  of  the  new 
variables  X,  as  follows: 

min  E  («®)  (,V  2a) 

J-i 

s.t.  E  (Ax®)  »  b  (W)  (IV‘2b) 

J*i 


E*j  -  1  <«>  (,v-2c> 

i-i 
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This  is  called  the  Master  Problem,  in  the  usual  terminology.  The  variables  w  and 
a  in  parentheses  are  the  dual  variables  associated  with  each  respective  set  of  constraints 
of  the  master  problem.  So  w  actually  is  a  vector  of  length  equal  to  the  number  of  rows  in 
A,  whereas  a  is  a  single  variable,  since  (IV.2c)  defines  only  one  constraint. 

At  this  point,  it  is  useful  to  write  the  dual  problem  associated  with  the  master 
problem: 


max  w  b  +  v  d 

(IV.3a) 

s.t.  w  (A  xa))  +  a  <  (cx,,,) 

(IV.3b) 

w,  a  unrestricted 

(IV.3c) 

Referring  back  to  the  Master  Problem,  trying  to  solve  (IV.2)  is  a  hard  task,  since  the 
number  of  variables  is  normally  very  large.  They  correspond  to  the  different  corner  points 
of  X,  and  finding  them  is  computationally  expensive,  perhaps  prohibitively  so. 

Now  consider  the  following  problem: 

max  (w  A  -  c)  x  (IV.4a) 

s.t.  x  e  X  (IV.4b) 

Note  that  this  problem  is  bounded  and  not  empty,  so  it  attains  an  optimal  solution 
at  one  corner  point  of  X.  Note  also  that  the  solution  solves  precisely  the  pricing  operation 

that  takes  place  when  applying  the  Simplex  method  to  (IV.2): 

I  [(Ax™)]  ] 

max  (Zj  -  Cj}  »  max  ]  (w,  a]  -  cx^  ) 

(IV.5) 
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where  the  index  runs  over  1,2 . t,  the  comer  points  of  X.  The  set  (IV.4)  defines  the 

Subproblem,  and  is  easily  solved  because  of  the  special  structure  of  its  constraints. 
Actually,  the  Subproblem  task  in  the  whole  context  is  to  take  advantage  of  the  special 
structure  of  the  constraints  defining  the  set  X  in  order  to  solve  the  pricing  operation  of  the 
Master  Problem  as  a  fast  linear  program. 

In  summary,  the  Dantzig-Wolfe  method  proceeds  in  the  following  way: 

(i)  Start  with  a  feasible  solution  for  the  master  problem  (IV.1); 

(ii)  Solve  the  associated  subproblem; 

(iii)  The  solution  to  the  subproblem  provides  the  coordinates  of  a  new  comer  point 
that  comes  into  play; 

(iv)  The  subproblem  also  provides  an  entering  variable  in  the  basis  of  the  Simplex 
Tableau.  After  pivoting,  the  basis  inverse,  dual  variables,  and  RHS  are 
updated; 

(v)  Now  we  have  an  improving  basic  feasible  solution  of  the  master  problem.  The 
set  of  X’s  provide  a  new  convex  linear  combination  which  is  a  feasible  solution 
(the  best  so  far)  to  the  original  problem  (IV.1); 

(vi)  Iterate  the  process  to  convergence  of  the  solution. 

Finally,  it  can  be  shown  (Bazaraa  et  al,  1990)  that,  at  each  iteration,  we  obtain  an 
upper  and  lower  bound  to  the  optimal  objective  function  of  (IV.1 ).  So  the  computations  can 
be  stopped  at  a  determined  level  of  accuracy  of  the  current  solution. 

A  numerical  example  of  the  practical  use  of  this  method  is  given  in  Appendix  B. 
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C.  BENDERS’  DECOMPOSITION 

Benders’  decomposition  scheme  performs  a  complementary  procedure  to  that  in 
Dantzig’s  method.  Here,  the  dual  is  really  the  problem  to  solve.  But  let  us  start  with  the 


primal.  Consider  the  following  problem: 

min  c  x  (IV.6a) 

s.t.  Ax  r  b  (W)  (IV.6b) 

Dx  >  d  (v)  (IV.6c) 

The  dual  to  this  problem  is 

max  w  b  +  v  d  (IV.7a) 

s.t.  w  A  +  v  D  £  c  (IV.7b) 

w  unrestricted,  v  >  0  (IV.7c) 


This  is  the  problem  to  be  solved  using  Benders’  method.  It  is  interesting  to  keep 
considering  it  as  dual,  in  order  to  view  it  in  terms  of  a  decomposition  scheme. 

Consider  w  as  a  parameter.  Then  restate  (IV.7)  as: 


max 

(w  unrestricted) 


max  (vd) 

s.t.  vD  s,  c  -  wA 
v  a  0 


(IV.8) 


Note  that  the  dual  of  the  inner  maximization  problem  in  (IV.8)  is: 


f  min  (c  -  wA)  x 
(  s.t.  Dx  *  d 
l  x  i  0 


(IV.9) 


Therefore,  at  optimality,  the  objective  function  of  (IV.9)  is  equivalent  to  that  of  the 
mentioned  inner  problem.  This  fact  allows  us  to  express  (IV.8)  as: 
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max  {wb  -  min  (c  -  wA)x  \  (IV.10) 

(w  unrestricted)  \  s.t  x  e  X  J 

Since  the  current  inner  minimization  problem  is  bounded  and  nonempty,  it  attains 

an  optimal  solution  at  one  of  the  extreme  points  of  X.  Suppose  that  X  has  t  of  these 
extreme  points,  and  name  them  in  similar  fashion  as  we  did  in  section  A.  Then  we  can 
formulate  the  following  statement,  equivalent  to  (IV.10): 

max  Jwb  ♦  min  (c  -  Wajx^  1  _  ()V11) 

(w  unrestricted)  \  ]  =  1,2 . t  j 

The  last  problem  can  be  restated  in  the  following  terms: 

max  z  (IV.  12a) 

s.t.  z  £  wb  +  (c  -  wA)  Xqj,  j  =  1,2 . t  (IV.  12b) 

z,  w  unrestricted  (IV.  12c) 

The  above  problem  (IV.12)  represents  the  Master  Problem  in  this  procedure.  This 
setting  (IV.12)  expresses  the  previous  problem  (IV.11)  without  the  two  “levels".  On  the 
other  hand  (and  this  complicates  things)  the  problem  defined  in  (IV.12)  has  t  constraints, 
one  per  corner  point.  Those  are,  in  practice,  too  many  constraints. 

Let  us  underline  the  parallelism  between  this  situation  and  the  situation  at  stage 
(IV.3)  in  the  Dantzig-Wolfe  procedure.  In  the  earlier  case,  the  situation  was  handled  by  a 
column-generating  procedure.  Here  we  will  proceed  on  a  row-generating  fashion.  We 
will  come  back  to  this  point. 

It  is  useful  to  form  the  dual  of  (IV.12).  To  facilitate  things,  express  (IV.12)  as  follows: 
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max  z  (!V.l3a) 

a.t  wiAxg,  -  b)  ♦  z  s.  cx^  (X)  (IV.l3b) 

w,  z  unrestricted  (IV.  13c) 


Again,  we  have  written  the  next  dual  variables  associated  with  the  constraints  in  the 
problem.  Next  we  can  formulate  the  dual  of  (IV.  13)  to  obtain: 

min  £  (cxgj)  Xj  (IV.14a) 

s  t.  £  (Axg  -  b)  X,  =  0  (IV.  14b) 

j*i 

£x,  =  i  (iv.  i4c) 

1-1 

Note  that  the  last  problem  is  precisely  the  Dantzig-Wolfe  approach  to  (IV.6),  where 
the  comer  points  correspond  to  the  set  X  defined  by  the  constraints  (IV.6c). 

A  way  of  alleviating  the  difficulty  of  dealing  with  a  large  number  of  constraints,  that 
arose  at  (IV.  12)  is  by  a  relaxation  technique.  It  is  based  on  the  following  fact:  if  we  solve 
(IV.  13)  to  optimality,  but  using  only  a  subset  of  the  constraint  /.13b),  then  the  optimal 
solution,  i.e.  (w,  z)  ,  attains  a  value  of  the  objective  function  that  is  an  upper  bound 
for  the  original  problem  (IV.2).  (It  is  easy  to  see  that  it  is  so,  because  the  set  of  values  of 
x  defined  by  the  subset  of  constraints  contains  the  feasible  region  defined  by  (IV.  12b), 
and  both  objective  functions  are  the  same).  So  if  the  pair  (w,  z)  found  is  such  that 
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it  is  feasible  for  the  constraints  (IV.  13b)  then  the  solution  is  optimal  to  the  original  problem. 
Again,  this  is  a  hard  task,  due  to  the  large  number  of  constraints. 

The  above  check  is  equivalent  to  checking  that 

i  S  Sb  .  m|o{(V«A>xf  (IV.  15) 

Problem  (IV.15)  is  called  The  Subproblem  in  this  decomposition. 

Summarizing  Benders'  procedure: 

(i)  Solve  the  LP  subproblem  (IV.15). 

(ii)  If  the  solution  satisfies  (IV.15)  then  we  are  done. 

(iii)  Otherwise,  let  xk  be  the  optimal  solution  to  (i).  Add  the  constraint 
z  s  wb  +  (c  -  wA)x(10  to  the  relaxed  problem  (i). 

(iv)  Reoptimize  until  some  relaxed  problem  gives  an  objective  value  z  equal  to  the 
optimal  value  in  (IV.15) 

D.  LAGRANGIAN  RELAXATION 

Lagrangian  relaxation  uses  the  idea  of  relaxing  the  explicit  (bundle)  constraints  by 
bringing  them  into  the  objective  function  with  associated  Lagrange  multipliers  w. 

So  if  our  original  problem  is  of  the  form  (IV.  1),  then  this  technique  relaxes  the 
problem,  restating  it  as  follows: 

min  cx  +  w(Ax  -  b)  (IV.16) 

S.t.  x  c  X  (IV.  17) 
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The  resulting  problem  is  celled  the  Legrengbm  Relaxation,  or  Lagrangian 
Subproblem  of  the  original  problem. 

If  we  define 

L(w)  =  Min  {  cx  +  w(Ax  -  b)  }  (IV.18) 

as  the  Lagrangian  Function,  then  the  following  holds: 

Lemma:  For  any  vector  w  of  the  Lagrangian  multipliers,  the  value  L(w)  is  a  lower  bound 
on  the  optimal  objective  function  of  the  original  optimization  problem. 

The  proof  can  be  written  in  abreviated  fashion  as: 

min  {  cx  :  A  x  =  b,  x  e  X  } 

=  min  {  c  x  +  w  (A  x  -  b)  :  A  x  =  b,  x  e  X  } 

>  min  {  c  x  +  w  (A  x  -  b)  :  x  e  X  } 

The  above  inequality  is  produced  when  the  elimination  of  the  constraints  Ax  =  b 
from  the  right  hand  side  above  cannot  cause  an  increase  in  the  value  of  the  objective 
function  (most  likely  the  value  will  decrease). 

Now,  if  we  want  the  sharpest  lower  bound  on  the  optimal  objective  value,  we  would 
need  to  solve  the  following  optimization  problem: 

L*  =  max  L(w)  (IV.19) 

W 

which  is  called  Lagrangian  Multiplier  Problem  (or  Lagrangian  Duat)  associated  with 
the  original  optimization  problem. 
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Theorem:  The  optimal  objective  function  value  L'  of  the  Lagrangian  multiplier  problem 
is  always  a  lower  bound  on  the  optimal  objective  function  value  of  the  original  problem. 
Proof:  Let  LS  be  the  set  of  all  the  optimal  objective  values  L(w).  The  maximum  is  a 
member  of  the  set.  Therefore  it  is  also  a  lower  bound  for  the  optimal  objective  of  the 
original  problem. 

So  we  have  the  following  relationship: 

L(w)  s  L*  z*  =  cx  (IV.20) 

The  above  leads  to  the  following  two  properties: 

Property  (a)  ( Optimality  testy.  Suppose  that  w  is  a  vector  of  Lagrangian  multipliers  and 
x  is  a  feasible  solution  to  the  optimization  problem  (IV.1)  satisfying  the  condition 
L  (w)  =  cx  .  Then  L(w)  is  an  optimal  solution  of  the  Lagrangian  Multiplier  Problem  and 
x  is  an  optimal  solution  of  the  optimization  problem  (IV.1). 

Property  (b):  If  for  some  choice  of  the  Lagrangian  multiplier  vector  w  the  solution  x  of 
the  Lagrangian  Relaxation  is  feasible  in  the  optimization  problem  (IV.1),  then  x  is  an 
optimal  solution  to  (IV.1)  and  w  is  an  optimal  solution  to  the  Lagrangian  Multiplier 
Problem. 

Hence  the  Lagrangian  relaxation  gives  a  lower  bound  to  the  optimal  value  for  the 
objective  function  of  optimization  problems.  This  is  precisely  its  primary  use. 
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TVMOflwn:  Suppose  that  we  apply  the  Lagrangian  Relaxation  technique  to  a  linear 
program  (LP)  defined  as  follows: 

min  cx 

s.t.  Ax  *  b  (w) 

Dx  =  q  (v) 

*  *  0  (P) 

by  relaxing  the  constraints  Ax  =  b.  Then  the  optimal  value  L'  of  the  Lagrangian 
multiplier  problem  la  equal  to  the  optimal  objective  function  value  of  (LP). 

Note  the  associated  dual  variables  w,  v.  Suppose  ( w  * ,  v  * ,  p  * )  are  optimal 
solution  for  the  dual  problem.  Let  also  x*  be  the  corresponding  primal  solution.  By  linear 
programming  theory  x‘,  w  ,  v\  p*  satisfy  the  dual  feasibility  and  complementary  slackness 
conditions: 


w*A  «■  v*D  +  p*  =  c  (W .21  a) 

p-x*  =  0  (IV.21b) 

p.  *  o  (IV.21c) 

Now,  let  us  write  the  Lagrangian  relaxation  applied  to  (LP)  for  L(p)  =  L(w*): 
min  cx  -  w'  (Ax  -  b)  •  (c  -  w‘  A)  x  +  w  b  (IV.22a) 

s.t.  D  x  =  q  (v)  (IV.22b) 

v  >  0  (P)  (IV.22C) 
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The  dual  feasibility  and  complementary  slackness  conditions  for  this  problem  are: 


vD  *  p  «  c  -  w*A 

(IV.23a) 

p’x  =  0 

(IV.23b) 

p  i  0 

(IV  .23c) 

The  solution  x*  is  feasible  for  the  Lagrangian  L(p)  when  p  is  substituted  by  w\ 
This  is  because  x*  is  feasible  for  (LP).  Furthermore,  it  satisfies  (IV.23).  Therefore  x*  is 
an  optimal  solution  for  L(w‘).  The  Lagrangian  Optimality  Property  implies  that  L(w')  is  the 
optimal  objective  function  of  (LP). 

Lagrangian  relaxation  is  a  general  solution  strategy  for  solving  mathematical 
programs  that  permits  us  to  decompose  problems  to  exploit  their  special  structure.  As 
such,  this  solution  approach  is  perfectly  tailored  for  solving  many  models  with  embedded 
network  structure. 

The  reviewed  Dantzig-Wolfe  decomposition,  Bender's  decomposition  and  Lagrangian 
relaxation  are  methods  that  can  be  considered  to  work  in  two  (or  more)  levels.  The 
importance  of  this  will  be  seen  when  the  economic  interpretation  of  one  of  them  (Dantzig- 
Wolfe)  is  presented  later  in  Chapter  (X. 
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V  AGGREGATION  METHOD 


A.  INTRODUCTION 

The  idea  underlying  this  approach  is  that  of  consolidating  'neighboring'  origins 
(destinations).  Baias  (1965)  uses  aggregation  to  produce  an  algorithm  of  specific 
application  to  the  solution  of  the  transportation  problem.  In  his  approach,  the  solution  to 
the  aggregated  problem  is  used  as  a  means  for  obtaining  some  information  that  allows 
for  a  simplified  version  of  the  original  problem.  Progress  can  be  made  by  considering  at 
each  step  only  a  certain  part  of  the  problem’s  data.  We  will  present  this  method,  due  to 
Baias,  posed  on  a  transportation  problem. 

The  scheme  consists  of  the  following  general  pattern: 

(i)  Simplify  (aggregate)  the  problem,  reducing  its  size  (this  could  be  considered 
an  auxiliary  step  for  (ii)) ; 

(ii)  With  the  information  obtained  from  (i),  construct  a  derived  problem,  which  is 
smaller  than  the  original  problem,  although  of  the  same  order  of  magnitude, 
and  solve  it; 

(iii)  Hopefully,  (ii)  will  solve  the  original  problem.  If  not,  some  iterative  process  is 
applied  that  gradually  increases  the  size  of  (ii).  (An  unfortunate  sequence  of 
iterations  could  end  up  reproducing  the  original  problem,  but  it  is  not  likely). 

Figure  4  is  a  representation  of  the  problem  to  be  solved.  In  the  figure,  (a)  is  the 
original  problem.  The  aggregation  process  groups  ’contiguous’  nodes  to  form  a  single 
aggregated  node.  This  gives  the  schematic  representation  (b).  (A  numerical  example  is 
offered  in  Appendix  B). 
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B.  NOTATION  AND  DEFINITIONS 

For  notation  purposes,  consider  the  original  set  of  supply  nodes  to  be  arranged  in 
m  groups.  Groups  are  denoted  as: 


Group  M,:  1,  2, 

Group  M2:  m,+1 ,  m,+2, 

•••»  nrij* 

Group  M3;  m2+1 ,  m2+2, 

•  ••»  ^3» 

Group  Mm.,;  m^+l,  rnm.,+2, 

•m  mm 

(V.1) 

Similarly,  destination  nodes  are  arranged  in  ‘n’  groups  as: 

Group  1,  2, 

•••» 

Group  N2:  n,+1,  n,+2, 

Group  N3;  n2+1 ,  n2+2, 

••••  ^3» 

Group  Nn.i;  nn.,+1,  nn.,+2, 

....  n„ 

(V.2) 

Let  H  be  the  set  of  all  indices  ’h’  from  (V.1),  and  K  the  set  of  all  indices  ’k’  from 
(V.2).  Then  the  original  problem  will  be  called  hereafter  Problem  (I),  and  is  stated  in  the 
following  terms: 


Problem  (1) 

min  E  E 

(V.3a) 

h«H  keK 

S.t.  E*hk  =  «h 

(V3b) 

keK 

E  Xhk  “ 

(V.3c) 

heH 

^k  *  0 

(V  .3d) 
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Next  we  define  the  aggregated  problem  aa  Problem  (t)  as  follows: 

(i)  Origin  nodes  belonging  to  group  M,  will  form  the  aggregated  node  i,  with  I 
being  the  set  of  ail  i; 

(ii)  Destination  nodes  belonging  to  group  N,  will  be  aggregated  as  node  j,  with  J 
being  the  set  of  all  j. 

The  costs,  supplies  and  demands  for  the  aggregated  problem  (II)  are  defined  as 
follows: 


(iii)  C,  =  min  (c* 

1  he  H,  k€  K} 

(V.4a) 

> 

n 

P 

0) 

•ar 

with  h  e  M, 

(V.4b) 

B,  =  £„ak 

with  k  e  Nj 

(V.4c) 

1C,  -  cj  £  a, 

for  all  h  e  M„  k  e  N,,  i  e  1,  j  e  J,  and  some  a 

(V.4d) 

Then,  the  aggregated  problem  (II)  is  stated  as 

Problem  (II) 

min  5Z  £ 

lei  jeJ 

(V.5a) 

s.t.  5^  =  A, 

JfJ 

(V.5b) 

=  Bj 
Id 

(V.5C) 

X|j  a  0 

(V.5d) 

Figure  4  is  a  representation  of  both  problems, 

L8t  x  be  a  nondegenerate  feasible  solution  of  (II).  Associated  with  Problem  (If) 
consider  a  new  problem,  called  the  partial  problem  related  to  X.  For  simplicity,  hereafter 
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we  will  refer  to  this  problem  as  Problem  (III),  implicitly  assuming  it  is  related  to  a  flow 
coming  from  (II).  The  problem  is  formed  according  to  the  following 

RULE:  Consider  only  those  flows  in  the  original  Problem  (I)  going  from 

groups  M,  to  groups  N,  such  that  xtj  >  0  - 

Better  than  stating  Problem  (III),  the  reader  is  encouraged  simply  to  follow  the 
example  in  Appendix  B. 

Given  the  arrangement  of  Problem  (I)  it  is  easy  to  see  that  every  component  Xn  of 
the  flow  X  has  an  associated  block  of  flows  in  Problem  (I).  Those  are  the  flows  going  from 
to  N,.  So,  a  total  of  mxn  blocks  of  flow  form  the  whole  Problem  (II).  The  above  rule 
reduces  the  size  of  the  problem  to  be  solved,  by  excluding  all  those  blocks  of  flows 
associated  with  zero  flow  in  the  solution’s  components  to  the  Problem  (II).  This  is 
particularly  interesting  in  long  transportation  problems,  where  a  lot  of  flows  are  at  level 
zero  in  the  optimal  solution. 

Clearly,  we  can  establish  a  1-1  mapping  between  feasible  and  nondegenerate 
solution  to  (II)  and  problems  (III).  (This  is  because  a  solution  of  such  class  for  (II)  is 
defined  by  the  flow  values  corresponding  to ’m  +  n  -  1'  nonzero  variables). 

Furthermore,  for  every  feasible  solution  to  problem  (II)  there  exists  a  feasible 
solution  to  problem  (III).  It  suffices  to  consider  the  flow 


x 


hk 


AiBJ 


Also  any  feasible  solution  to  a  Problem  (III)  is  a  feasible  solution  to  the 
corresponding  Problem  (I).  (Problem  (III)  is  more  restricted  than  (I)). 


41 


(a)  Original  Problem 


(b)  Aggregated  Problem 


Figure  4 Balas  ’  Node  Aggregation. 
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C.  OPTIMALITY:  RELATIONSHIP  BETWEEN  PARTIAL  AND  ORIGINAL 
PROBLEMS. 

Here  we  use  the  complementary  slackness,  expressed  in  terms  of  the  slack 
variables  (Chapter  II).  The  resulting  condition  is: 

Dhk  =  c*,,  -  u„  -  vk  £  0,  for  valid  values  of  h,k  (V.6) 

where  Dhk  represents  a  slack  variable,  and  uh ,  vk  are  duals.  The  expression  (V.6) 
will  be  zero  for  positive  flows,  and  greater  than  zero  for  flows  equal  to  zero. 

Let  D’hk  be  the  values  of  the  slack  variables  in  those  blocks  not  in  the  optimal 
solution  to  (II).  Then 

Theorem  (I):  An  optimal  solution  to  problem  (ill)  is  also  optimal  for  problem  (I)  if  and  only 
if  D’hk  >  0. 

Proof:  Let  x'  be  an  optimal  solution  to  (III).  Then  it  is  feasible  solution  for  (I).  And  since  it 
satisfies  the  complementary  slackness  conditions  it  is  also  an  optimal  solution  for  (I). 

Now  suppose  x  is  an  optimal  solution  for  problem  (I)  and  not  for  problem  (III).  Then 
there  will  be  some  positive  flow  x*,,  not  in  the  optimal  solution  for  (III).  The  corresponding 
slack  variable  D’hk  will  be  less  than  zero,  which  contradicts  the  assumption  in  the  theorem. 
Next  is  an  algorithm  suggested  by  Balas  (1965). 

Algorithm: 

(1)  From  Problem  (I)  formulate  Problem  (II); 

(2)  Solve  Problem  (II); 

(3)  From  the  optimal  solution  to  (II)  formulate  Problem  (III); 
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(4)  Solve  Problem  (III); 

(5)  Check  if,  at  this  point,  the  solution  satisfies  Theorem  (I). 


(i)  if  solution  to  (III)  is  optimal  for  (I)  stop; 

(ii)  if  not,  form  a  new  Problem  (til)  adding  those  blocks  in  (III)  violating 
Theorem  (I). 

(6)  Improve  the  solution  obtained  in  (4)  so  as  to  make  it  optimal  for  this  new 
partial  problem;  Go  to  (5). 

In  Appendix  B  a  numerical  example  develops  in  practical  terms  the  preceding 
algorithm. 

Balas’  aggregation  is  a  two-level  method  to  be  applied  only  to  the  solution  of 
transportation  problems.  In  the  next  chapter,  more  general  multilevel  methods  are 
presented  to  solve  more  general  network  flow  problems. 
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VI  SCALING 


A.  INTRODUCTION 

Scaling  techniques  give  the  best  worst-case  running  time  for  many  of  the  network 
optimization  problems.  They  are  computationally  effective  in  cases  where  reoptimization 
from  a  good  starting  solution  is  more  efficient  than  solving  the  problem  from  scratch. 
Gabow  (1 984)  demonstrates  that  under  a  "similarity  assumption"  (that  the  logarithm  of  the 
largest  magnitude  in  the  data  is  of  the  order  of  the  number  of  nodes)  scaling  could  be 
used  to  advantage  in  designing  theoretically  efficient  algorithms  for  a  broad  variety  of 
network  optimization  problems. 

Scaling  was  introduced  by  Edmonds  and  Karp  (1972).  Using  scaling,  the  first 
polynomial  time  algorithm  for  the  minimum  cost  problem  was  produced.  The  algorithm  of 
Edmonds  and  Karp  uses  scaling  on  the  right  hand  side  repeatedly  to  employ  the  out-of- 
kilter  method  iteratively.  Between  1972  and  1984,  there  was  little  interest  in  employing  the 
scaling  approach  of  Edmonds  and  Karp  in  actual  network  flow  computation.  In  spite  of 
favorable  asymptotic  worst-case  performance  it  was  widely  presumed  that  algorithms 
employing  data  scaling  would  not  be  nearly  as  fast  in  practice  as  the  network  simplex 
method.  Since  1985,  research  has  been  extensive.  Researchers  now  recognize  that 
scaling  techniques  have  great  theoretical  value  as  well  as  potential  practical  significance 
(see,  for  example,  Ahuja,  et  a) ,  1993). 
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B.  NOTATION  AND  DEFINITIONS 

Let  G  =  (N,  A)  be  a  (directed)  network  defined  by  a  set  N  of  ’n*  nodes  and  a  set  A 
of 'm'  directed  arcs.  The  following  notation  holds: 
d(i):  the  demand  of  node  i; 

Cjj  :  the  cost  associated  with  the  arc  (i,j); 

C  :  the  largest  cost; 

:  the  capacity  of  the  arc  (itj); 

U  :  the  largest  capacity; 

r,,  :  the  residual  capacity  of  the  arc  (i,j). 

A  pseudoflow  x  is  a  function  x:  A  ->  R  satisfying  only  the  capacity  and 
nonnegativity  constraints  (it  need  not  satisfy  the  mass  balance  constraints). 

The  Imbalance  of  node  i  is  given  by 

o(i)  *  d(i)  *  £  x, ,  -  £  x, 

*  d.l)eA|  fl:  (IJ)eA) 

all  i  e  A  (VI.D 

e(i)  is  called  excess  ( deficit)  if  it  is  greater  (lesser)  than  0. 

The  set  of  nodes  with  excess  (deficit)  is  denoted  E  (or  D). 

Next  we  define  the  e-optimality  of  a  pseudoflow  x.  This  concept  is,  independently, 
due  to  Bertsekas  (1979)  and  Tardos  (1985).  Figure  5  is  a  kilter  diagram  that  graphically 
compares  e-optimality  and  the  usual  optimality. 
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Figure  5.- 


KUter  Diagrams  showing:  (a)  Optimality;  (b)  e-Optimality. 


A  pseudoflow  x  is  said  to  be  e -optima/  (for  some  e  >  0)  if  for  some  set  of  node 
potentials  n  the  pair  (x,  it)  satisfies: 


c,K  >  €  -  X,  »  0 

~€  £  C|*  2*  €  -  0  S  X,  S  u, 

c,"  <  -e  -  X,  «  u, 


(VI. 2a) 
(Vl.2b) 
(VI  .2c) 


In  terms  of  the  residual  network,  the  pseudoflow  x  is  said  to  be  e-optimal  (for  some 
e  >  0)  if  for  every  arc  (i.j)  in  the  residual  network  G(x): 

a*  a  -€  (VI-3) 
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C.  COST  SCALING 

As  its  name  suggests,  cost  scaling  is  an  algorithm  approach  that  applies  scaling  to 
the  costs.  Assume,  without  great  loss  of  generality,  that  the  costs  are  integer.  Then: 
Theorem  W-f  (Bartsakas,  1986):  For  a  minimum  cost  flow  problem  with  integer  costs, 
any  feasible  flow  is  t-optimal  whenever  e  2  C  (largest  cost).  Moreover,  if  e  <  1/n  then 
any  t-optimal  feasible  flow  is  an  optimal  flow. 

Proof :  Let  x  be  any  feasible  flow  and  let  n  =  0.  Consider  the  residual  network  G(x).  Then 
for  every  arc  (i,j)  in  G(x)  cij  *  Cij  z  -c  .therefore  x  is  t-optimal  for  e  =  C. 

Now  consider  an  t-optimal  flow  x*  with  e  <  1/n.  Let  it  be  the  node  potentials 
associated  to  that  t-optimal  flow.  Let  W  be  any  directed  cycle  in  the  residual  network 
G(x').  Then,  by  (VI.3) 

£  c,"  i  -c  n  >  -1 

WJ>«W) 

But  the  costs  are  integers,  and  this  implies  that  the  above  sum  is  nonnegative. 
Now,  by  Corollary  11-2, 

E  c«  *  0 

l  U.j)€W) 

Therefore  W  is  not  a  negative  cost  cycle.  Hence,  by  Theorem  11-1  x'  must  be 
optimal. 

A  node  i  is  said  to  be  an  active  node  when  its  imbalance  is  positive.  An  arc  (i,j) 
adjacent  from  an  active  node,  and  belonging  to  the  residual  network,  is  said  to  be  an 

admissible  arc  if  e  s  c*3  <  o  .  In  Figure  6  for  an  arbitrary  arc  (i.j),  (a) 

2 

represents  a  situation  of  optimality,  while  (b)  illustrates  admissibility. 
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Figure  6.-  Kilter  Diagrams  showing:  (a)  Optimality;  (b)  Admissibility. 

Next  we  describe  a  generic-cost  algorithm  for  minimum  cost  flows  problems  which 
works  in  two  stages:  first,  it  converts  an  c-optimal  flow  into  a  (1/2  e)-optimal  pseudoflow; 
then  gradually  converts  this  pseudoflow  into  a  flow,  maintaining  the  (1/2  e)-optimality. 
Figure  7  shows  the  algorithm,  while  Figures  8  and  9  are  graphic  representation  of  the 
successive  steps  on  an  example.  Figure  8  starts  with  a  situation  determined  by  the  first 
residual  network,  and  e  =  8.  The  order  of  the  operations  performed  has  been  determined 
to  satisfy  the  example’s  purpose.  Note  that  the  values  of  the  node  potentials  remain 
constant  until  Figure  9(d),  when  a  situation  occurs  in  which  the  active  node  has  no 
associated  admissible  arc. 

In  the  example,  the  demands  of  the  nodes  can  be  computed  using  formula  (VI.1) 


as  follows: 


Algorithm  COST  SCALING 


BEGIN 

it  :=  0;  e  :=  C; 
x  is  any  feasible  flow 

WHILE  e7>  1/N  DO  BEGIN 

FOR  every  arc  (ij)  in  the  network  DO  BEGIN 
IF  positive  reduced  cost  then  xtJ  :=  0; 

ELSE  IF  negative  reduced  cost  then  xu  :=  utj; 

ELSE  leave  the  flow  on  that  arc  unchanged; 
end;  (for  every  ..} 

FOR  all  nodes  in  the  network  compute  node  imbalances  e(i); 

WHILE  the  network  contains  an  active  node  DO  BEGIN 
Select  an  active  node  ’i’; 

IF  there  is  any  admissible  arc  (i,j)  in  the  Residual  Network  then  begin 
push  8  =  min  (e(i),  rtj  }  units  of  flow  from  T  to  ]'; 
update  imbalances; 

end; 

ELSE  n(i)  :=  n(i)  +  e/2 
end;  (while  the  network  ...} 

£  =  e/2 

end;  (while  ...} 


END; 


Figure  7.-  Algorithm  Cost  Scaling. 


SO 


d(1)  =  e(1)  - 10  +  0  =  >10; 
d(2)  a  e(2)  -  35  +  10  =  5; 
d(3)  =  e(3)  -  0  +  30  =  20; 
d(4)  =  e(4)  -  0+  5  =  -15; 


The  imbalances  for  step  (f)  are  calculated  in  the  same  fashion: 

e(1)  =  d(1)+  0-0  =-10; 
e(2)  =  d(2)  +  0-0  =  5; 
e(3)  =  d(3)  +  0-20=  0; 
e(4)  =  d(4)  +  20  -  0  =  5. 

D.  CAPACITY  SCALING 

Capacity  scaling  produced  the  first  polynomial-time  algorithm  for  the  minimum  cost 
problem  (Edmonds  and  Karp,  1972).  We  present  an  algorithm  that  scales  capacities  due 
to  Orlin  (1988).  It  uses  the  concept  of  the  A  -Residual  network  G(x, A),  defined  to  be  the 
subgraph  of  the  residual  network  G(x),  induced  by  the  arcs  (forward  or  reverse)  having 
a  residual  capacity  of  at  least  A  units.  For  sake  of  clarity  let  us  define  also: 

A-Exceas  Node .  A  node  having  an  excess  of  at  least  A  units; 

A-Deftctt  Node:  A  node  having  a  deficit  of  at  least  A  units; 

S(A)r.  Set  of  A-Excess  Nodes; 

T(A).  Set  of  A-Deficit  Nodes. 
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Figure  9. 


Generic  Cost  Scaling  Algorithm.  Example  (b) 
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The  technique  used  is  to  send  flow  from  a  node  of  S(A)  to  a  node  of  T(a).  The  flow 
is  sent  along  a  direct  path  defined  in  the  A-residuai  network.  Such  a  technique  guarantees 
that  every  time  a  flow  is  sent,  it  will  be  sent  in  "packages"  of  A  units. 

The  algorithm  maintains  nonnegative  reduced  costs  in  the  A-residual  network 
( reduced  cost  optimality.  A  pseudocode  is  showed  in  Figure  10. 

In  case  that  more  than  one  path  is  available,  we  choose  the  most  favorable  (that 
with  a  minimum  reduced  cost). 

Each  scaling  phase  of  the  algorithm  is  identified  by  the  value  of  the  scale  parameter 
A.  The  scale  parameter  is  halved  when  each  phase  is  finished.  Within  0(log  U)  scaling 
phases,  A  =  1 ,  and  by  the  requirement  that  the  data  be  integer,  every  node  imbalance  will 
be  zero  at  the  end  of  this  phase.  At  this  point  G(x,  A)  is  identical  to  G(x).  Since  every  arc 
in  G(x,  A)  satisfies  the  reduced  cost  optimality  conditions,  the  flow  obtained  is  optimal. 
Figure  10  shows  pseudocode  for  the  algorithm. 

Figures  11  through  14  describe  the  capacity  scaling  algorithm  on  a  simple  network 
representing  a  transportation  problem  with  two  supply  nodes  and  two  demand  nodes.  Two 
additional  nodes  (super-source,  s,  and  super-sink,  t)  are  included.  The  algorithm  is 
represented  step-by-step.  Figure  1 1  is  a  description  of  the  simplified  data  to  appear  on 
Figures  12  through  14.  So,  at  every  step  three  different  pictures  summarize  the  network 
status.  On  the  left  hand  side  the  real  network  is  shown,  with  data  of  flows,  capacities  and 
costs  of  each  arc.  Next  to  each  node  a  number  appears  when  there  is  an  excess  or  deficit 
associated.  The  following  describes  the  main  features  used  in  the  graphics: 

In  the  real  network  (left  hand  side)  thick  lines  represent  positive  flow,  while  dotted 
lines  mean  zero  flow.  Next  to  the  graph  the  current  status  of  the  sets  S(A)  and  T(A)  is  also 
shown. 
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The  two  small  networks  represented  on  the  right  hand  side  are  the  residual  (upper) 
and  A- residual  (lower),  respectively.  In  those,  continuous  lines  represent  direct  arcs,  and 
dotted  curved  lines  represent  reverse  arcs. 

Thick  lines  in  the  A- residual  network  correspond  to  the  next  flow  movement  from  a 
node  another  node.  Two  different  circumstances  motivate  flow  interchange: 

(i)  Existence  of  a  path  in  the  A-residual  network  connecting  a  node  in  S(A)  with 
a  node  in  T(A),  as  in  Figures  12(a),  13(d-e)  or  14(e); 

(ii)  Existence  of  an  arc  with  negative  reduced  cost  in  the  residual  network,  as  in 
Figures  12(c)  or  14(d).  In  such  cases,  the  value  of  the  reduced  cost  is  shown 
in  the  center  of  the  arc.  Recall  that  those  arcs  will  drive  flow  to  saturation,  so 
that  only  arcs  with  nonnegative  reduced  costs  remain  in  the  (residual) 
network. 

Nodes  with  excess  or  deficit  are  filled  in  white,  while  all  others  are  filled  in  black. 

The  following  are  the  noteworthy  situations  in  the  execution  of  the  example: 

Figure  12(b):  no  direct  path  is  available  connecting  S(A)  and  T(A).  Therefore 
the  parameter  scale  must  be  halved. 

Figures  12(c)  and  14(d):  after  halving  the  parameter  scale,  arcs  with  negative 
reduced  costs  presumably  occur  in  the  residual  network.  This  forces  flow  to 
be  sent  along  these  arcs  to  bring  them  to  saturation. 

Figure  14(e):  the  node  's’  has  been  isolated  in  the  residual  network,  with  no 
imbalance.  Thereafter,  this  node  cannot  be  accessed  in  any  A*residual 
network.  The  distance  labels  to  this  node  will  be  infinite.  This  situation  is 
consistent  with  the  requirement  of  saturating  all  the  arcs  going  to  the  super¬ 
sink. 
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Algorithm  Capacity  Scaling 
begin 

n  :=  0; 

A  :=  2*»  u; 

WHILE  A  >1  DO  BEGIN 

FOR  every  arc  (i,j)  in  G(x)  DO  BEGIN 
IF  r0  >  A  AND  c K0  (  0  then  begin 

send  rtJ  units  of  flow  along  arc  (i,  j);  {saturate  arc  (i,j)} 
update  x  and  the  imbalances  e 
end; 

S( A)  =  (i  e  G(x,  A),  such  that  e(i)  >  A  / 

T(A)  =  (i  6  G(x,  A),  such  that  e(i)  <  -  A  } 

WHILE  (S( A)  *  9  AND  T( A)  *  9) 

AND  (there  exists  a  path  from  S( A)  to  T( A))  DO  BEGIN 

select  a  node  k  e  S( A)  and  a  node  l  6  T(A) 
determine  shortest  path  distances  d  from  node  k  to  all  other  nodes 
in  G(x,  A)  with  respect  to  the  reduced  costs  c*. 

Select  a  node  q  €  T(A);  Let  P(k,  q )  be  the  shortest  path  from  k  to  q 

Augment  A  units  of  flow  along  the  path  P(k,  q) 

update  n  =  n  -  d 

update  x,  S( A),  T(A),  and  G(x,  A); 

end;  {while} 

A  =  A/2 

end  {while} 


Figure  10 l-  Algorithm  Capacity  Scaling 
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Scaling  techniques  maintain  certain  parallelism  with  multigrid,  that  will  be  proposed 
•n  the  “Conclusions"  section  in  Chapter  IX.  Also,  it  is  remarkable  that  scaling  suggests 
a  more  abstract  concept  of  a  grid,  represented  in  the  scaling  parameter.  But  this  concept 
requires  a  previous  exposure  to  the  concept  of  grids,  which  is  deferred  to  Chapter  VIII. 

In  the  next  chapter,  we  present  a  methodology  that  has  been  used  in  dynamic 
systems  control  problems.  It  is  known  as  the  Perturbation  Method,  and,  although 
somewhat  apart  from  the  general  path  of  this  research,  the  fact  that  aggregation  is  a  basic 
mechanism  in  the  development  of  the  perturbation  method,  together  with  the  special 
treatment  it  gives  to  some  linear  programs  is  of  special  interest  since  it  presents  more 
insights  into  the  process  of  building  up  a  multigrid  approach  to  optimization  problems. 
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Figure  12 Example:  Capacity-Scaling  Algorithm  (I) 
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Figure  14. 


Example:  Capacity-Scaling  Algorithm  (111) 


VII  PERTURBATION  METHOD 


A.  PERTURBATION 

The  main  idea  of  the  perturbation  method  is  to  transform  an  original  problem  into 
a  simpler  one  -  called  the  “reduced  problem*  •  .  The  original  problem  will  then  be 
considered  a  “perturbation*  of  the  reduced  problem.  The  perturbations  are  supposed  to 
be  small,  and  their  magnitude  is  characterized  by  a  parameter  e .  We  will  consider  only  the 
case  of  linear  programs. 

Consider  a  mathematical  program  F‘(0): 


min  CoT  x 
s.t.  AqX  i  0 


(VII.1) 


where  x  is  an  n-vector  and  Aq  is  an  mxn  matrix  of  constraints,  and  consider  the 
program  F'(e):  . 


min  c  T(e)  x 

s.t.  A(e)  x  2  0 


(VI  1.2) 


The  case  of  study  is  when  we  need  to  solve  a  problem  type  (VII. 2)  for  a  particular 
value  of  the  parameter  e.  Suppose  that  a  direct  solution  to  (VII.2)  is  difficult,  but  a  solution 
to  the  "reduced  program*  (VII.1)  is  comparatively  simple.  Then  it  is  reasonable  to  solve 
the  original  program  by  successive  approximations,  using  a  procedure  in  which  the  first 
phase  is  the  solution  to  the  reduced  program,  and  the  second  phase  is  the  computation 
of  a  correction  to  it. 
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B.  SIMPLICITY  AND  PROXIMITY. 

The  success  of  the  perturbation  method  relies  on  two  factors:  simplicity  and 
proximity. 

Linear  programs  are  the  simplest  dass  of  programs.  Within  this  class,  complexity 
increases  with  the  problem  size.  But  for  any  fixed  size,  obtaining  a  solution  can  be 
simplified  if  one  uses  the  specific  structure  of  the  programs  considered.  Given  that  both 
the  original  and  the  reduced  problem  belong  to  the  same  class  of  problems  (i.e.  linear 
programs),  the  next  factor  to  consider  in  order  to  determine  simplidty  will  normally  be  the 
possibility  of  admitting  direct  aggregation  or  decomposition,  discussed  below. 

A  formal  measure  of  proximity  of  the  original  to  the  reduced  problem  is 

A  =  sup  [supxeX|c(x)-Co(x)|,  supXfX  I  A,  (x)  -Aa.(x)  j ,  1=1 . w]  (VII.3) 

Generally  speaking,  the  smaller  A  is,  the  doser  the  solution  to  the  original  problem 
will  be  to  that  of.the  reduced  problem.  One  would  like  to  express  the  error  of  the  solution 
approximation  as  a  function  of  the  proximity. 

Some  types  of  structure  adrr  direct  simplification  are  considered  next.  They  can 
be  classified  as: 

(a)  Directly  decomposable  programs; 

(b)  Directly  aggregatable  programs; 

(c)  Directly  aggregatable  and  decomposable  programs. 
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1.  Directly  Decomposable  Programs 

Let  a  program  have  the  form 


min  +  C2Z2  +  •••  *  C|[Z|| 


(VII.4) 


where  zt  =  fi(xi)  ,  the  different  x1  being  vectors  of  disjoint  subspaces  of  the 

global  space  containing  the  vectors  x  satisfying  the  linear  program  (VIM).  And  let  the 
objective  function  be  monotonically  increasing  (i.e.  the  c’s  positive).  Then  to  solve  this 
program  it  is  sufficient  to  solve  k  independent  subproblems 


min  f,(x  *) 

s.t.  x'eX,,  s  =  1,  2 . k 


(VII.5) 


A  linear  program  is  close  to  being  directly  decomposable  when  it  has  the  form 


min  cz  +  ec'x 

s.t.  ApZ  +  e  A,  x  >  0  (VII.6) 

where 

x  -  (x1.  x2 . x“) 

z,  *  z,(x  *),  x  •  e  X, 

These  are  problems  in  which  the  state  of  each  subsystem  is  determined  by 
its  “own*  vector  x*,  and  the  activities  of  each  subsystem  depend  weakly  on  those  of 
other  subsystems. 
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Those  are  problems  of  the  form 


min  e  z 

s.t.  Az  £  b  (VII. 7) 

where  z  =  z(x),  x  is  n-dimensional  vector. 

Suppose  one  can  solve  the  aggregated  problem 

min  e  z 
s.t.  g(z)  >  0 

If  z*  is  a  solution  to  the  aggregated  problem  then  x  *  *  z  1  ( z  * )  ,  when 
solvable,  is  a  solution  to  the  original  problem. 

EXAMPLE.  The  following  simple  transportation  problem  (represented  in  Figure 
15): 

min  3  (x„  ♦  x12)  ♦  2  (x13  +  x14)  *  4  (x*  +  x^)  ♦  5  (x*,* x *) 

S.t.  ♦  X^j  *  x13  +  x^  =  10 

*21  +  *22  +  *23  +  *24  *  14 

X,1  *  Xj,  =8 

X12  +  *21  *  5 

X13  "■  *23  ~  ^ 

X14  +  *24  «  7 

X|  *  0 
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the  parentheses  above: 


*11 

*  *11 

+  *12 

*12 

=  *13 

*  *14 

*21 

=  *21 

^  *22 

*22 

=  *23 

-  *24 

This  produces  the  "aggregated"  problem, 
min  3;1t  +2  z12  +  4  z21  +5 

s.f. 


*11  +  2  Zyz 

+  4  z21 

*11  +  *12 

=  10 

*21  +  *22 

=  14 

*11  +  *21 

=  13 

*12  +  *22  =  ^ 
2  0 
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In  this  cue,  we  aggregated  pairs  of  demand  nodes  having  the  same  transportation 


costs  per  unit.  The  solution  to  the  aggregated  problem  is: 


*11 

-  0 

*12 

s  10 

*21 

-  13 

*22 

=  1 

and  the  optimal  value  of  the  objective  function  is  77. 


Now  if  we  solve  the  system 


*11 

+ 

*12 

- 

0 

*13 

*14 

= 

10 

*21 

+ 

*22 

- 

13 

*23 

♦ 

*24 

s 

1 

*11 

♦ 

*21 

m 

8 

*12 

*22 

= 

5 

*13 

+ 

*23 

- 

4 

*14 

*24 

= 

7 

we  can  obtain  the  following  solution  (not  unique): 


*11  *  *12 

-  0 

*13  = 

*14  = 

£ 

ii 

00 

*22  -  5; 

*23  = 

■*524  =  0; 

which  can  be  verified  to  solve  the  original  transportation  problem. 


3.  Directly  Aggregatable  and  Decomposable  Problama 

Consider  the  problem 

min  c,  z,  +  Cj  Zj  +  ...  +  c*  z„ 

s.t.  A(zv  Zj . zj  >  0,  (VII.8) 

where  z,  =  z,(x‘),  x*  e  X,,  s  =  1,2,...,k.  Here  the  objective  function  does  not  have  to  be 
monotonically  increasing.  One  can  obtain  solutions  zs*  to  the  aggregated  problem, 

min  c  z 

s.t.  g(z,,  z, zj  >  0  (Vll.9) 

If  there  exist  solutions  x*'  of  the  equations  z3  (x“)  *  zs*  then  the 

vector  x*  =  (x8*)  is  a  solution  to  the  general  problem. 

EXAMPLE:  Depicted  in  Figure  16  is  a  transportation  problem  with  this 
structure.  For  sake  of  simplicity  we  avoid  data  in  this  example.  It  consists  of  two 
"independent"  transportation  subproblems,  each  of  them  with  directly  aggregatable 
structure.  This  structure  is  granted  by  a  relationship  between  costs  parallel  to  that  of  the 
example  in  section  B.2  (i.e.  costs  from  a  node  to  successive  and  disjoint  pairs  of  adjacent 
nodes  are  equal). 

To  solve  the  problem  on  the  left  we  solve  to  optimality  each  of  the  two 
independent  subproblems.  To  solve  each  subproblem,  we  can  take  advantage  of  the 
special  structure  determined  by  the  costs  relations  above. 
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Supplies 


Demands 


Supplies 


Demands 


(a)  original  problem  (b)  related  problem 


Figure  16.-  Directly  Aggregatable  and  Decomposable  Problem. 

The  details  of  such  a  problem  are  difficult  and  tedious,  and  it  is  felt  that  the 
underlying  ideas  may  be  more  easily  appreciated  using  the  schematic  presentation  in 
Figure  16  than  by  exhaustive  analysis. 

C.  OPTIMAL  BASIC  SET  INVARIANCE 

Let  F'(e)  be  the  linear  program  defined  as 

min  c  T(e)  x 
s.t.  A(c)  x  =  b 

x  2  0  (VII.10) 
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with  corresponding  reduced  problem 


min  CoTx 

s.t  AqX  - 

x  i  0 

Let  D’(0)  be  the  dual  problem  associated  with  F'(0),  i.e. 

max  boTX 

S.t.  Aq  1  S  Pq 


(Vll.11) 


(VII.  12) 


Suppose  that  x0\  Xq  are  optimal  solutions  to  (VII.  11)  and  (VII.  12),  respectively. 

and  furthermore,  they  are  unique.  Then  there  exists  a  submatrix  B  of  the  matrix  A,,,  such 
that 


Bxb  =  bw 

(VII. 13) 

o 

A 

*  B 

X 

(VII.  14) 

BTXo  =  CB 

(VII. 15) 

N  TXo  >  CN 

(VII. 16) 

where 

xB'  is  an  m-vector  of  basic  variables,  xB*  =  (x-f,  jej0) 

J0  is  the  optimal  basic  index  set  (used  next), 

B  is  {A,,  jeJ0}  (submatrix  formed  by  columns  associated  with  the  optimal 

basic  index  set), 

N  's  |A|.  j«J0} 

C3  'S  (Cq.  jeJ0} 
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'S  {Co,,  ifj0} 


CN 

The  condition  (Vtl.16)  guarantees  the  uniqueness  of  the  solution;  (VII.14)  is  the 
nondegeneracy  condition  and  guarantees  the  uniqueness  of  the  solution  Xq  of  the  dual 
problem. 

The  optimality  conditions  for  the  perturbed  program  are; 


A(€)x(€)  =  b(e) 

(VII. 17) 

x(e)  a  0 

(VII. 18) 

AT(C))(€)  i  c(c) 

(VII.  19) 

XT(€)[AT(C)X(€)  -  C(c)l  =0 

(VII.20) 

here,  {VII. 17)  and  (VII. 18)  express  the  primal  and  dual  feasibility;  (VII.20)  is  the 
complementary  slackness  condition. 

THEOREM  (Optimal  Basic  Set  Invariance):  Let  A(e),  b(e),  c(e)  be  continuous  functions 
for  e  e  [0,  e’J.  Let  the  solution  to  the  reduced  problem  (2)  be  unique  and  nondegenerate. 
Then  there  exists  a  positive  upper  bound  e  _  t'  such  that,  if  0  <  e  <  e"  then  the  optimal 
basic  set  of  the  perturbed  program  (1)  is  invariant,  and  its  unique  solution  is  given  by 

X'(«>  -WW  *N»f  <V"-21> 

where 

Xg  ( € )  is  {Xj  (e) ,  jeJ0)  =  B'Me)b(e)  (VII.22) 

xN’(e)  =  0  (VII.23) 
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r(e)  =  (B -1)T(e)cB(€) 


(VII.24) 


Note  that  the  set  J0  consists  of  the  indices  in  the  base  of  the  reduced  problem.  So  the 
claim  here  is  that  the  set  of  indices  for  both  optimal  solutions  of  the  reduced  and 
perturbed  problems  are  the  same.  Then,  to  obtain  a  solution  to  the  perturbed  program  it 
suffices  to  solve  the  reduced  problem,  determine  the  optimal  basis  index  set,  and  solve 
(VII. 13)  to  obtain  xB(e) .  Finally,  make  xN(e)  and  x*(e)  =  [xB(c)  x„(e)]  . 

Sketch  of  proof :  Let  x0*  =  tx0’B  x0*N]  be  the  unique  and  nondegenerate  optimal 

solution  of  the  reduced  problem.  It  must  satisfy  the  optimality  conditions  (VII.13-16). 
Substituting  (VII. 13)  into  (VII. 14)  and  (Vil.15)  into  (VII.16), 

B'^O)  b(0)  >  0 
BT(0)  (B  ’1T(0)  c,(0)  >  cN(0) 

Now,  if  A(f),  b(e),  cfe)  are  smooth  enough,  for  £  sufficiently  small 

B~’(€)  b(c)  >  0  (VII.27) 

Bt(€)  (B  1)T(c)  C8(e)  >  CN(e)  (VI  1.28) 

Inequalities  (VII.27-28)  check  the  validity  of  x*(e)  defined  as  in  (VII.21)  as  an  optimal 
solution  to  the  perturbed  problem. 


(VII. 25) 
(VI  1.26) 
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So  far,  we  are  determining  only  the  indices  of  the  solution  x(e)  that  form  the 
basis,  but  not  the  magnitudes  of  the  associated  components.  This  is  achieved  using  the 
following  corollary. 

COROLLARY:  Let  A(e),  b(e)  be  differentiable  up  to  and  including  order  m.  Then  there 
exists  an  e"  such  that,  for  all  e  e  [0,  e“],  the  solution  x0'(e)  may  be  written  in  the 
form 

x*(c)  =  Xq*  *  ex(1)  *  e2xm  *  ...  *  £mX(m)  ♦  Ofe"**1)  (VII.29) 


If 

A(e)  =  Ag  +  eA1  (VII. 30) 

b(£)  =  b0  +  eb,  (VII.31) 

c(e)  =  c0  +  ec,  (VII.32) 

then  using  conditions  (VII.21-22)  we  can  rewrite  formula  (VII.8)  in  the  form 

(A(,  +  cA,)  (Xq*  ♦  ex(1)  *  c2x(?>  *  ...  e,Vm>  *  o(em*1))  =  b0  +  eb, 
or,  expressed  in  powers  of  e: 

(AoXo'  -  bo)  -  e  (A1x0*  *  Ao x<’>  -  b,)  *  e2  (A,X<’>  *  A„x<2>)  ♦  ...  =  0 

in  which  the  first  term  is  identically  zero,  and  the  other  terms  give  the  following  recursive 

relationships  between  successive  vectors  x<f)  : 

V(1)  -  b,  -  A,Xo’ 

AoX(2)  =  -  A,x(1) 

AoX^1)  =  -  A,x»  (VII.33) 
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and  using  condition  (VII.32) 

F’(«)  *  F„-  .  e  [  c,TXo  •  (r)T  (6,  -  A,>0  ]  *  0<«*) 

F  *(c)  =  F0*  -  eF,'  *  ofc2),  Osese"  (VII.34) 

Thus,  we  have  a  complete  result  for  the  case  of  uniqueness  and  nondegeneracy. 
The  results  above  can  be  generalized  for  the  case  of  nonuniqueness,  provided 
special  conditions  are  met.  Let  conditions  (VII.30-32)  hold,  and  suppose  further  that 
6'  is  the  set  of  solutions  to  the  reduced  problem, 

A’  is  the  unique  dual  solution. 

Then 

F  ’(e)  =  F0*  +  e  F,*  -  0(6*)  (VII.35) 

with 

F,*  -  j;  *  b/r  (VII.36) 

where  J,’  is  the  optimal  value  of  the  objective  function  for  the  following  'auxiliary 
problem": 

min  c,  -  A/r 

x  e  0"  (VI  1.37) 

The  mentioned  special  condition  requires  the  solution  to  the  "auxiliary  problem"  must 
be  unique  and  be  also  a  nondegenerate  solution  to  the  reduced  problem. 

The  application  of  the  results  above  to  linear  programming  problems  can  be 
summarized  in  the  following  outline  of  an  algorithm: 

(1)  Solve  the  reduced  program  and  its  dual; 

(2)  If  the  solutions  in  (1)  are  unique,  then  J0  determines  the  optimal  basis  for  the 

perturbed  problem; 
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(3)  Determine  x(e)  as  a  solution  to  B(c)x  >  b  (c)  ; 

(4)  If  the  solution  to  the  reduced  program  is  not  unique,  but  it  is  nondegenerate,  then 


-  determine  the  optimal  set  9’ 

-  Solve  the  'auxiliary  problem*  (VII.37); 

(5)  If  the  solution  obtained  in  (VII.37)  is  unique,  it  determines  the  set  J0  and  the  matrix 
B(e).  Then  go  to  (3). 

(6)  If  the  solution  obtained  in  (VI  1.27)  is  unique  but  degenerate  then  we  need  to 
determine  the  optimal  set  A  of  the  dual  program  and  to  solve  the  dual  auxiliary 
program; 

(7)  If  the  solution  obtained  in  (6)  is  unique  then  the  corresponding  basis  gives  J0  and 
B(c).  Then  go  to  (3). 

Figure  17  represents  this  flow  in  a  schematic  manner. 

The  work  presented  up  to  this  point  is  intended  to  familiarize  the  reader  with  some 
of  the  basic  ideas  of  optimization,  network  problems  and  optimality  conditions,  and  the 
optimization  methods  that  could  be  considered  as  members  of  a  wide  family  of  multilevel 
methods  (in  which  also  some  aspects  of  the  techniques  described  in  this  Chapter  can  be 
included).  We  are  in  position  to  describe  the  peculiarities  of  the  multigrid  methodology,  to 
be  done  next  in  Chapter  VIII.  Finally,  the  objectives  of  the  research  will  be  addressed  in 
Chapter  IX. 


74 


75 


VIM.  THE  MULTIGRID  METHOD 


A.  INTRODUCTION 

Detailed  descriptions  of  the  multigrid  method  are  given  in  Briggs,  1987,  or 
Hackbusch,  1980.  Here  we  will  summarize  the  essential  facts  that  could  help  in  building 
the  appropriate  approach  to  solving  certain  types  of  optimization  problems  by  means  of 
that  technique. 

Consider  the  problem  of  heat  propagation  along  a  finite  rod  of  unit  length.  This  is 
a  boundary  value  problem  expressed  by  the  second-order  ordinary  differential  equation 

•  u"  +  O  U  S  f,  0  <  X  <  1 ,  o  £  0, 

u(0)  =  u(1)  *  0  (VIII.  1) 

where  u  and  f  are  functions  of  x.  Numerically,  the  problem  can  be  solved  using 
a  finite  difference  method  (Gerald,  et  al,  1989).  The  domain  of  the  problem  is 
partitioned  into  N  subintervals  of  constant  length  h  =  1/N.  The  partition  so  obtained 
defines  a  grid  of  N+1  points,  including  the  end  points  0  and  1 .  The  resulting  second- 
order  finite  difference  approximation  produces  a  system  of  linear  equations 

A  x  =  f  (VIII.2) 
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The  system  (VIII.2)  has  the  following  special  structure: 


1 

h 2 


2+oh2  -1 

-1  2+oh2  -1 


-1 

-1  2+c  h2 


'  ft  ' 

*2 

4 

* 

* 

• 

VN-2 

^N- 2 

J 

.  V»-'  . 

.  '"-1  . 

(VIII.3) 


Equation  (VIII.3)  is  sometimes  abbreviated  by  the  difference  stencil  (Hackbusch, 
1980),  as  follows: 

h‘2  [-1  2+oh2  -1]  v  =  f 

Conventional  iterative  methods  of  solution  use  an  initial  guess  v(0>  as  initial  solution. 
The  proximity  to  the  true  solution  u  is  measured  by  the  error  e  *  u  -  VQ),  which  can  be 
regarded  as  the  exact  correction  to  the  (in  this  case)  initial  guess.  The  error  is  unknown. 
One  way  to  estimate  this  proximity  is  by  using  the  residual.  So,  if  v  is  an  approximation 
to  the  true  solution,  the  residual  r  is  defined  as: 


r  =  f  -  A  v  =  Au-Av  =  A(u-v)  =  Ae 


(VIII.4) 


In  (VIII.4)  the  residual  r  measures  how  well  v  solves  (VIII.2). 
The  expression 

A  e  =  r 


(VIII.5) 
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is  called  the  naktual  aquation.  It  is  important  to  note  that  the  residual  equation  uses  the 
same  matrix  as  the  original  equation.  This  is  the  tint  key  tact. 

Conventional  iteration  processes  are  represented  as 

v0>  «  pyW.g.  (VIII.6) 

where  P  is  the  iteration  matrix.  The  error  after  n  iterations  is 

e<n)  =  u  -  v<n)  =  (Pu  +  g)  -  (Pv^'  +  fl) 

=  P  u  -  P  =  p  (u  -  v^'*)  =  P  e(n‘1), 

therefore 

I  e(n)  |  »  |  P  r  |  e<°i  |  (VIII. 7) 

It  can  be  shown  that 

lim  Pn  =  0  if  and  only  if  max  |  A.^P)  |  <  1  (VIII.8) 

n—  I 

where  (p)  are  the  eigenvalues  of  P.  This  suggests  that  not  all  linear  systems  of 
equations  can  be  solved  using  these  conventional  iterative  techniques.  The  success  using 
those  approaches  will  depend  in  the  structure  of  the  matrix  A  defining  the  problem. 

Assuming  the  problem  is  solvable  and  the  matrix  A  nonsingular,  the  error  e(0> 
corresponding  to  the  initial  guess  of  the  solution  can  be  expressed  as 

e<°>  =  £  CkW,,  (VIII.9) 

k-1 
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where  ck  is  a  real  coefficient,  and  wk ,  k  =  1.2.....N-1  are  the  eigenvectors  of  A. 

A  second  key  led  is  that  the  error  is  smoothed  by  conventional  iterative  methods. 
(See  Chapter  I).  The  fact  allows  that,  after  k  iterations,  e(k>  can  be  usually 
approximated  with  fewer  points  than  v04.  This  can  be  exploited  to  represent  e00  on  a 
coarser  grid. 

In  the  case  (VIII. 1),  the  eigenfunctions  have  the  form  of  sine  functions  (Fourier 
modes).  The  km  eigenfunction  is  expressed  as 

w„  =  Ck  sin  (k  n  x)  (VIII.10) 

k  is  called  the  wevenumber,  equal  to  the  number  of  half  cycles  which  constitute  v  in 
the  domain  of  the  problem. 

Correspondingly,  the  k*  eigenvector  of  the  matrix  A  in  (VIII.2),  expressed  as  a  one¬ 
dimensional  array  (vector)  of  N  equi-spaced  sample  points  has  the  following  form  for  its 
components: 

wk  3  =  sin|-lj^Lj  with  1<  k£  N-1  and  0<j<N  (VIII. 11) 

and  we  say  that  the  vector  is  represented  in  a  N-grid. 

In  multigrid,  grids  are  denoted  in  terms  of  the  distance  ’h’  between  two  consecutive 
points  of  the  grid.  If  the  mesh  size  is  h  then  we  denote  the  corresponding  grid  as  fth.  For 
this  last  to  make  sense,  it  is  necessary  to  have  fixed  the  total  length  of  the  grid  domain. 

As  an  example  of  a  class  of  problems  in  which  multigrid  techniques  are  successfully 
applied,  Figures  18  and  19  represent  the  results  of  applying  a  conventional  Jacobi  iterative 
process  to  the  system  of  linear  equations  corresponding  to  the  heat  propagation  along  a 
rod  of  unit  length.  The  problem,  stated  above,  is  described  as  an  example  in  Briggs,  1987. 


79 


Figure  18.-  Change  in  the  Error  Vector. 
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Figure  19.-  Change  in  the  Residual  Vector. 
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Here  we  have  modified  the  setting  to  be  more  general  in  that  we  allow  the  right  hand  side 
of  the  system  to  be  other  than  zero.  We  chose  the  values  of  the  right  hand  side  to  be 
integer  random  numbers.  The  details  and  the  *  Mathematics*  code  used  to  solve  the 
problem  are  included  as  Appendix  C.  It  also  contains  the  numerical  values  of  the  right 
hand  side  vector,  and  some  intermediate  results  that  could  be  used  for  checking 
purposes. 

A  total  of  200  Jacobi  iterations  were  performed.  Figure  18  shows  the  change  in  the 
error  vector.  An  arbitrary  oscillatory  error  was  introduced  as  initial  value,  a  consequence 
of  the  oscillatory  initial  guess  for  the  solution.  It  can  be  seen  that  the  error  becomes  less 
and  less  oscillatory,  as  the  number  of  iterations  increases.  (It  is  easily  seen  that  the 
number  of  crossings  of  the  error  curves  through  the  x-axis  decreases  with  increasing 
iterations).  After  199  iterations,  only  a  very  smooth  wave  remains.  This  kind  of  behavior 
in  the  error  is  required  for  multigrid  techniques  to  be  successful.  For  completeness,  we 
also  graphed  the  change  in  the  residual  vector.  This  is  represented  in  Figure  19. 

A  second  example  was  constructed  having  the  solution  shown  in  Figure  20.  This  is 
a  more  oscillatory  solution.  The  next  figure  shows  the  evolution  of  both  error  vector  and 
current  solution  after  a  few  iterations  are  performed.  Notice  the  smoothing  of  the  error, 
despite  the  lack  of  smoothness  in  the  current  solution. 

The  program  included  in  Appendix  C  allows  for  an  “animated*  picture  of  both  error 
and  residual  vectors  for  the  first  example.  Also,  it  can  be  used  for  studying  different  values 
of  the  right  hand  side  vector. 
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Figure  20.-  True  Solution  for  Second  Example. 

B.  GENERAL  MULTIGRiD  SCHEME. 


In  section  A  we  have  seen  one  condition  for  multigrid  to  work,  i.e.,  the  error  vector 
should  be  smoothed  as  the  iteration  progresses.  Now  let  us  examine  what  happens  to  the 
error  when  appropriately  "moved”  from  a  finer  grid  to  a  coarser  grid.  Suppose  that  our 
error  vector  is  sufficiently  approximated  by  a  wave  with  wavenumber  k  =  4  (Figure  22(a)), 
posed  on  a  grid  with  12  grid  points,  which  we  will  refer  to  as  Qh .  Furthermore,  suppose 
that  there  is  no  higher  frequency  component  in  the  error  vector  (i.e.  an  iterative  process 
has  taken  place  that  eliminated  them  previously).  The  former  grid  is  capable  of  "detecting" 
frequency  components  with  wavenumber  less  than  k  =  12,  as  can  be  seen  in  Figure 
22(b)(c).  Note  that  the  wave  with  k  =  12,  N  =  12  is  indistinguishable  from  the  wave  having 
k=0,  N=12.  Higher  k’s  cause  an  "aliasing  phenomena,  depicted  in  Figure  23(d)  (they  are 
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Figure  21.- 


Second  Example.  Evolution  of  the  Error  Vector  and  Current 
Solution. 


84 


Cl)  k  -  4,  N  -  12 


Figure  22 Representation  of  various  components  of  the  Error  Vector. 
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Figure  23.- 


Representation  of  various  spectrum  components  of  the  Error  Vector 
(cont). 


confounded  with  other  less  oscillatory  waves). 

Let  us  go  back  to  our  error  k  =  4  wave.  In  the  spectral  representation  of  the 
"detectable"  waves  on  the  N  =  12  grid  (Figure  24(a))  this  wave  lies  in  the  first  third  of  the 
spectrum.  We  can  say  that  it  belongs  to  the  class  of  the  spectrum’s  low  frequency 
components,  defined  as  those  waves  located  in  the  first  half  of  the  spectrum. 

Now  let  us  express  the  corresponding  spectrum  for  a  N  =  6  grid.  This  is  done  in 
Figure  24(b).  Now  the  range  of  "detectable"  waves  goes  up  to  k  =  6.  In  this 
representation,  the  wave  with  k  =  4  is  located  on  the  second/last  third  of  the  spectrum.  We 
can  say  that  this  wave  is  a  high  frequency  (or  oscillatory)  component,  a  term  defined 
in  a  similar  fashion  as  in  the  preceding  paragraph.  Thus,  our  error  vector  becomes  more 
oscillatory  when  transferred  to  the  coarser  grid.  This  is  a  third  key  fact. 

Figure  24  helps  in  understanding  one  of  the  basic  mechanisms  of  the  multigrid 
technique.  Whenever  the  error  vector  is  formed  only  by  low  frequency  components  in  the 
relative  context  of  a  grid,  we  can  force  it  to  become  more  oscillatory  by  transferring  it  to 
another  coarser  grid.  Typically,  this  is  done  doubling  the  grid  mesh  (i.e.  halving  the 
number  of  grid-points).  This  action  is  supported  by  invoking  the  fourth  key  fact :  an  error 
that  is  smooth  on  a  grid  Qh  can  be  accurately  transferred  to  a  coarser  grid  t2Zh  and 
furthermore,  the  error  representation  in  QZh  contains  enough  information  to  be 
interpolated  to  the  next  finer  grid  Qn  and  produce  an  accurate  representation  of  the  error 
there. 

Now  we  can  point  out  what  the  multigrid  approach  is  in  the  above  problem.  Figure 
18  represents  the  smoothing  of  the  error  vector  when  iteration  is  in  progress.  So  we  relax 
(iterate)  on  a  grid  until  the  error  representation  is  "sufficiently  smooth"  on  that  grid.  The 
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error  then  can  be  accurately  represented  on  a  coarser  grid.  This  transfer  is  accomplished 
without  loss  of  information.  But  now,  the  error  representation  on  the  coarser  grid  appears 
to  be  more  oscillatory.  A  second  relaxation  process  in  this  new  grid  will  eliminate  the 
“new"  oscillatory  components  faster.  When  the  relaxation  process  shifts  the  error  to  the 
low  frequency  area  of  the  spectrum  it  is  time  to  transfer  it  to  a  new  grid,  still  coarser,  and 
so  on. 

Two  strategies  form  the  core  of  the  multigrid  technique.  The  first  of  them  solves  a 
problem  on  a  coarser  grid  to  obtain  an  initial  guess  on  the  next  finer  grid.  This  is  called 
nested  iteration.  The  idea  is  to  find  an  initial  guess  with  little  computational  effort,  by 
means  of  building  the  initial  guess  on  the  next  coarser  grid.  A  basic  principle  lies  under 
this  idea:  it  is  cheaper  to  find  a  solution  if  the  initial  guess  is  good,  and  nested  iteration 
provides  a  good  guess  cheaply.  The  second  strategy  uses  the  residual  equation  to  obtain 
an  approximation  to  the  error  in  the  present  grid.  That  approximation  is  computed  by 
relaxing  the  residual  equation  on  the  next  coarser  grid.  This  version  of  the  error  is 
translated  back  to  the  current  (finer)  grid  to  be  applied  as  a  correction  to  the  present 
representation  of  the  solution  vector.  The  advantage  is  that  it  is  cheaper  to  work  on  a 
coarser  grid.  This  second  strategy  is  called  coarse  grid  correction. 

The  operation  that  involves  transferring  from  coarser  to  finer  grids  is  called 
interpolation,  while  the  opposite  operation,  i.e.,  transferring  from  finer  to  coarser  grids, 
is  denoted  as  restriction.  Although  in  the  standard  multigrid  methodology  the  operators 
performing  those  processes  are  linear,  in  general  interpolation  and  restriction  could 
conceivably  be  nonlinear  operators. 
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The  way  one  combines  iterations  at  the  different  levels  of  coarsening  yields  several 
multigrid  patterns,  known  as  multigrid  cycles  or  schemes.  The  most  common  are  the  V- 
cycle,  the  p-cycle  and  the  Full  Multigrid  V-Cycle,  described  by  Briggs  (1987).  We  will 
briefly  describe  them  next. 

1.  p-Cycle 

It  is  schematically  represented  in  Figure  25.  The  operations  performed  are: 

(a)  Relax  v,  times  on  the  initial  grid  with  a  given  initial  guess  vh;  these 
relaxations  (iterations)  make  the  error  eh  become  smooth. 

(b)  Restrict  the  residual  equation  to  the  next  coarser  level  Q2h.  Since  the  structure  of 
this  problem  is  a  parallel  of  the  original  problem,  solve  this  residual  equation 


89 


recursively,  by  a  call  to  the  u-cycle.  Do  this  \i  times.  As  a  result,  we  obtain  an 
approximation  for  e2*. 

(c)  interpolate  to  obtain  an  approximation  of  eh  on  the  fine  grid  flh. 

(d)  Add  the  error  eh  to  v*  to  obtain  an  improved  approximation  for  v*. 

(e)  Relax  v2  times  the  equation  Ah  uh  =  f  with  initial  guess  v*  computed  in  (d). 

Achieving  the  vedor  vh  by  (a)  through  (e)  is  done  much  more  rapidly  than 
proceeding  by  (a)  alone.  This  "speed-up"  in  the  approximation  process  is  obtained  thanks 
to  the  third  and  fourth  key  facts  described  earlier  in  this  chapter,  that  allow  for  working  on 
a  simpler  grid  (fewer  points),  thus  making  the  computational  work  less.  Concurrently,  the 
oscillatory  components  of  the  restrided  error  are  eliminated  quickly  by  the  relaxation 
steps.  This  is  a  crucial  fad  in  multigrid. 

The  V-Cycle  is  a  particular  version  of  the  n-Cycle,  for  the  special  case  that 
\l-  1.  When  m-  =  2  the  resulting  scheme  is  called  the  W-Cycle. 

2.  Full  Multigrid  V-Cycie  (FMV) 

The  FMV  is  depicted  in  Figure  26.  The  representation  is  taken  from  Wesseling 
(1992).  Other  authors  (Briggs,  1987)  consider  the  scheme  starting  at  the  coarsest  grid  (i.e. 
the  initial  coarsening  to  the  coarsest  grid  is  implied,  and  not  shown).  Briefly,  the 
operations  performed  are: 

(a)  Restrict  the  original  equation  to  all  grids  (in  particular,  to  the  coarsest). 

(b)  For  k  =  1,  2, ....  M  (coarsest  grid) 

Solve  AH  uH  =  fH  by  performing  a  n-cycle  v0  times; 

Interpolate  to  the  next  finer  grid  ,  i.e.  from  QH  to  £2h  (usually  h  =  H/2); 

Repeat  until  the  finest  grid  is  reached. 
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Figure  25.-  Schematic  representation  of  the  Multigrid  fi -Cycle. 

This  chapter  has  presented  the  principles  and  mechanisms  of  the  multigrid  method. 
We  are  now  prepared  to  analyze  the  feasibility  of  such  an  approach  to  the  solution  of 
optimization  problems. 
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Figure  26.-  Full  Multigrid  Cycle,  (a)  \0  *  1;  (b)  v0  -  2  (not  completed). 
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IX.  APPLYING  THE  MULTIGRID  METHOD  IN  OPTIMIZATION:  THE  LONG 
TRANSPORTATION  PROBLEM 

A.  INTRODUCTION 

In  Chapter  VIII  we  have  seen  that  multigrid  is  not  a  specific  solution  technique,  but 
rather  a  philosophy  of  attacking  problems  that  is  general  and  powerful  when  applied  to 
suitable  problems.  Specific  solution  techniques  must  be  used.  The  restriction  and 
interpolation  operations  are  general  concepts  that  should  be  defined  in  the  context  of  each 
problem.  It  is  the  idea  of  changing  the  grid  when  the  error  is  smooth  that  characterizes 
multigrid,  as  well  as  the  idea  that  a  solution  in  a  "simpler”  grid  is  a  good  starting  point  for 
attacking  the  problem  on  the  next  "finer"  grid. 

The  next  two  sections  list  the  characteristics  of  the  types  of  problems  that 
traditionally  have  been  solved  using  multigrid,  and  a  comparative  list  for  the  class  of 
problems  that  typically  are  solved  by  optimization.  Section  D  reviews  some  previous 
research.  The  objective  in  sections  E  and  F  is  to  extract  the  positive  lessons  from  these 
analyses,  point  out  the  difficulties  found  and  analyze  them.  Some  insights  into  the 
multigrid  method  are  included  with  conclusions  that  hopefully  will  a»d  the  better 
understanding  of  the  optimization  procedures  that  could  support  the  implementation  of 
algorithms  based  on  the  multigrid  methodology. 

B.  CHARACTERISTICS  OF  MULTIGRID  PROBLEMS 

Multigrid  methods  have  been  successfully  applied  to  solve  certain  partial  differential 
equations.  These  kind  of  problems,  when  solved  numerically,  are  approximated  by 
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systems  of  algebraic  equations.  Restricted  to  the  linear  case,  we  observe  that  these 
problems  are  characterized  by  the  following  common  features: 

They  usually  correspond  to  physical  processes,  i.e.,  energy  propagation, 
variation  of  space  parameters  when  the  subject  is  exposed  to  a  system  of 
forces,  etc.  in  general,  these  type  of  problems  describe  phenomena  in  which 
there  is  a  weak  influence  between  pairs  of  local  states  corresponding  to  areas 
located  far  apart  in  the  system. 

The  system  of  equations  that  describe  the  phenomena  usually  express  the 
influence  of  particular  values  of  the  solution  (energy,  speed,  flow,  etc)  on  a 
certain  neighborhood  of  their  space.  In  the  example  case  of  heat  propagation 
through  a  rod,  every  row  of  the  matrix  A  representing  the  system  is  obtained 
by  a  pattern  of  values  that  affect  immediate  neighbor  components  of  the  u 
vector  (see  chapter  VIII).  The  zeros  in  each  row  of  the  matrix  express 
mathematically  the  weak  effect  mentioned  above,  in  the  sense  that 
interrelation  between  variables  is  restricted  to  the  (neighbor)  columns  having 
nonzero  coefficients. 

The  weak  global  dependence  between  grid  points  determines  its 
behavior  through  successive  iterations.  That  is,  given  the  state  of  the  system, 
the  local  variation  induced  by  a  variable  change  is  propagated  as  a  sequence 
of  local  interactions,  and  the  resulting  state  of  the  system  is,  in  general,  very 
close  to  the  initial  one. 
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The  problems  are  originally  posed  on  a  continuum.  For  computation  purposes, 
the  problem  is  discretized.  The  initial  values  of  the  variables,  represented  as 
vectors,  constitute  a  sample  of  the  continuous  variables  of  the  problem. 
Similarly,  the  solution  vectors  obtained  are  samples  of  continuous  variables. 
For  a  sufficiently  fine  grid,  if  we  express  the  values  of  the  n  variables  at  the 
grid  points  as  n- tuples,  the  knowledge  of  the  n-tuple  corresponding  to  a  grid 
point  gives  us  some  information  about  what  the  coordinates  of  the  neighbor 
grid  points  could  be.  We  can  express  this  property,  inherent  in  continuous 
functions,  by  saying  that  the  problems  have  an  implicit  ordering  in  the 
composition  of  the  solution  vector.  In  Figure  27(a)  a  sample  vector  of  x- 
values  is  represented.  The  dark  area  corresponds  to  the  continuous  set  of  x- 
values.  The  implicit  ordering  is  materialized  by  the  function  f(x).  This  property 
is  important  for  interpolation  purposes. 

These  problems  are  convergent  when  solved  by  iterative  processes.  That  is, 
when  the  same  problem  is  solved,  using  progressively  approximated  solutions, 
the  error  becomes  smaller  and  smaller.  This  is  typically  expressed  as  the  error 
being  0(hp)  for  some  p. 

As  a  result  of  being  exposed  to  iterative  procedures,  the  error  term  is 
smoothed.  Thus,  after  some  number  of  iterations  little  improvement  is 
achieved,  since  smooth  errors  are  very  slowly  reduced. 
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Figure  27.-  X-Vector  (a)  PDE  Problem;  (b)  Optimization  Problem 

Since  the  error  is  not  available,  the  residual  is  used  as  a  reference  of  the 
progress  in  the  solution  process.  Although  not  necessarily  the  case,  the 
residual  is  normally  a  good  reference  for  this  purpose. 

The  restriction  operator  transfers  the  problem  from  a  grid  to  the  next  coarser 
grid.  This  means  that  it  needs  to  define  the  different  aspects  of  the  problem 
on  the  coarser  grid.  The  interpolation  operator  performs  the  opposite 
operation.  The  fact  that  these  problems  possess  a  strong  local  coupling, 
together  with  the  fact  that  the  solution  vectors  are  samples  of  continuous 
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sets,  are  key  features  for  the  interpolation  process  to  work  properly.  In 
standard  multigrid,  both  operators  are  linear. 

C.  CHARACTERISTICS  OF  (UNEAR)  OPTIMIZATION  PROBLEMS 

The  previous  chapters  have  considered  some  optimization  principles  and  techniques 
that  could  support  a  multigrid  approach  to  solving  (linear)  optimization  problems.  Let  us 
focus,  for  simplicity  and  consistency,  on  transportation  problems. 

In  general,  they  are  constrained  problems.  They  usually  carry  an  associated 
set  of  linear  constraints.  This  set  is  expressed  as  a  matrix,  and  is  used  to 
check  for  the  validity  (feasibility)  of  the  current  solution.  As  the  computation 
proceeds,  the  current  solution  vector  solves  a  system  of  equations  defined 
using  only  a  submatrix  ( basic  submatrik)  of  the  constraint  matrix. 

The  objective  of  the  problem  is  find  a  set  of  values  for  the  variables  that 
minimize  (maximize)  a  function  (objective  function)  of  the  n  variables  of  the 
problem.  There  is  no  parallel  concept  to  that  of  the  residual  in  the  solution  of 
the  systems  of  equations.  On  the  other  hand,  since  the  main  goal  is  maximize 
(minimize)  the  objective  function,  there  is  a  valid  reference  (at  hand)  of  how 
close  we  are  to  the  solution  as  the  process  proceeds.  This  is  because,  at 
each  step,  bounds  for  the  optimal  value  of  the  objective  function  can  be 
calculated,  providing  a  means  to  measure  that  proximity.  (Notice  that  usual 
optimization  algorithms,  i.e.,  simplex  method,  keep  track  of  the  best  solution 
obtained  and  bounds.  They  proceed  "improving*  the  current  objective,  so  that 
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as  more  steps  are  performed,  the  bounds  on  the  objective  function  are 
narrower). 


Contrary  to  the  PDE  case,  the  current  solution  (including  in  this  term  both 
initial  and  final  solutions)  is  represented  by  a  vector  which  does  not  constitute 
a  sample  of  a  continuous  set  of  values.  This  is  schematically  presented  in 
Figure  27.  The  set  of  values  of  the  variables  in  an  optimization  problem  is  (a) 
finite  and  (b)  is  formed  by  all  the  possible  data  points. 

In  general,  there  is  no  relation  of  proximity  determining  the  structure  of  the 
equations.  That  is,  contiguous  sources  or  sinks  are  not  necessarily  correlated, 
they  do  not  necessarily  influence  each  other.  Compared  to  the  PDE  case,  we 
can  say  that,  as  a  general  rule,  if  we  knew  the  value  of  a  flow  component  x„ 
this  says  little  about  the  values  of  "contiguous"  components  xM  or  x^,  for 
example.  There  is  no  implicit  order  materialized  in  the  form  of  a  relationship 
like  f(x)  in  the  continuous  case  (a). 

When  the  problems  are  bounded,  they  have  an  optimal  solution.  The  optimal 
solution  may  or  may  not  be  unique.  Optimization  techniques  yield  solutions 
generally  based  on  an  enumeration  process.  It  is  the  finite  structure  of  the 
problem  which  guaranties  the  solution  to  the  problem. 
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In  most  practical  problems,  each  variable  has  individual  bounds.  Normally, 
these  bounds  are  locally  assigned,  and  constitute  an  essential  parameter  in 
determining  tee  problem  behavior  when  being  solved. 


0.  PREVIOUS  WORK 

Ron  (1987),  proposed  the  idea  of  a  multigrid  approach  for  the  N*dty  Traveling 
Salesman  Problem  (TSP).  This  is  a  standard,  NP-complete  problem  that  arises  in  several 
contexts.  For  a  description  see  Roberts  (1984).  An  estimate  of  the  optimal  length  of  the 
closed  TSP  path  L  for  this  case  is  known  to  be  equal  to 

L  =  0.749 y[N  *  o(sfN)  (91) 

for  sufficiently  large  values  of  N  (Bonomi  and  Lutton,  1984).  The  Asymptotic  TSP  (ATSP) 
finds  paths  within  a  few  percent  of  the  above  value.  In  her  research,  Ron  considers  a 
number  N  of  cities  randomly  (e.g.  in  a  uniform  distribution)  positioned  in  tee  unit  square. 
Instead  of  taking  an  arbitrary  first  approximation  (random  order  of  cities),  she  constructs 
an  approximation  as  follows.  Coarsen  tee  problem  by  creating  a  sequence  of  fN/2]  pairs 
of  cities  (referred  as  coarse  cities),  each  being  located  at  the  mean  positions  of  the  two 
cities  it  represents.  The  pairs  are  constructed  by  associating  the  nearest  available  one 
with  each  city  in  turn.  Her  method  was  tested  in  a  variety  of  examples  and  gave 
reasonable  first  approximations,  yet  it  failed  to  become  a  better  solver  since  tee  coarser 
problems  are  not,  in  general,  geometrically  equivalent  to  the  finer  problems,  and  thus  the 
solution  interpolated  from  a  coarser  level  is  not  necessarily  a  solution  for  the  finer  one. 
Still,  for  the  ATSP  she  reports  results  with  pates  no  longer  than  30%  off  the  optimal 
solution. 
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There  are  some  characteristics  in  the  ATSP  problem  that  makes  it  attractive  to  being 
approached  via  multigrid.  As  the  number  of  cities  grows,  the  minimization  processes  at 
regions  of  the  plane  far  apart  are  weakly  coupled  to  each  other.  Surely,  the  minimized 
path  will  not  jump  back  and  forth  between  cities  far  distant  from  each  other,  i.e.,  it  is 
expected  first  to  visit  neighbor  cities  before  switching  to  other  far  regions.  This  implies  the 
assumption  of  neighborhood  influence,  described  in  section  A.  That  is,  the  cost  of  vis 
remote  cities  before  nearby  cities  must  be  relatively  large.  In  such  conditions,  the  reported 
results  are  logically  explained.  However,  from  the  optimization  theory  standpoint,  the 
computational  results  cannot  be  considered  attractive.  Ron  reports  better  results  in 
applying  multigrid  to  a  physics  problem,  the  Ising  spin  problem,  which  is  the  core  of  her 
thesis. 

Kaminsky  (1989)  proposes  an  algorithm  for  the  long  transportation  problem  using 
multigrid  techniques.  Presumably,  the  selection  of  the  long  transportation  problem  for 
experimentation  comes  from  the  fact  that  origin  nodes,  being  small  in  number,  are  left 
intact,  i.e.,  they  do  not  change  through  the  process  of  being  exposed  to  the  different  grids. 
This  simplifies  the  problem.  He  defines  the  problem  as  a  geometrical  transportation 
problem,  because  of  the  additional  assumption  that  origins  and  destinations  have 
locations  in  space,  and  the  cost  of  shipping  between  any  pair  of  origin-destination  nodes 
is  related  to  the  distance  between  them.  (The  idea  of  intrinsic  ordering  appears  in  the 
setting  of  this  problem,  allowing  it  to  be  posed  as  multigrid).  Kaminsky  uses  aggregation 
to  define  the  coarse  destinations.  Two  neighbor  nodes  on  grid  h  are  aggregated  into  a 
single  node  in  grid  2h,  with  demand  equal  to  the  sum  of  the  two  demands.  Supply  nodes 
are  not  aggregated.  Two  different  rules  are  used  to  set  the  costs  in  the  coarser  problem. 
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The  first  rule  is  a  * weighted  aggregation?  (Zipkin,  1977),  which  provides  the  best  initial 
solution,  compared  with  other  schemes.  The  second  rule,  valid  for  subsequent  steps, 
makes  use  of  a  previous  state  of  the  problem,  in  this  rule,  the  costs  to  coarser 
destinations  are  the  dual  solution  values  associated  with  each  origin  node  when  solving 
the  associated  block  problem.  This  block  problem  is  defined  for  a  single  coarse  node. 
There  are  as  many  block  problems  as  coarse  destination  nodes  at  each  level.  The 
interpolation  process  (coarse-to-fine  grid)  is  the  solution  of  the  block  problems  themselves. 
In  Figure  28  there  is  an  interpretation  of  an  individual  block  problem.  The  values  Xu  are 
the  flows  computed  when  solving  the  coarse  problems,  i.e.  flow  from  supply  node  i  to 
coarse  destination  J. 

Of  special  interest  is  the  discussion  that  Kaminsky  provides  on  local  relaxation  in 
Chapter  5  of  his  thesis.  A  process  similar  to  that  of  coarse  to  fine  interpolation  is  applied 
to  a  subgroup  of  destinations  to  “improve''  the  current  feasible  solution  at  a  determined 
state  of  the  problem.  This  involves  solving  a  linear  programming  subproblem.  The  reports 
of  the  experimentation  are  that,  by  applying  such  method  of  relaxation,  the  results  are 
ineffective  in  improving  a  solution  in  which  the  origins  supply  sets  of  destinations  that  are 
contiguous  in  the  space.  But  this  is  exactly  the  expected  character  of  the  optimal  solution 
(regionalized  solution).  The  converse  is  true  when  the  solution  is  not  regionalized.  The 
best  way  to  do  local  relaxation  is  left  as  an  open  question.  Even  so,  his  reported  results 
can  be  considered  an  important  step  towards  multigrid-based  optimization  solving 
techniques. 

Cavanaugh  (1992)  removes  the  spacial  dependence  of  shipment  costs  in  an  attempt 
to  generalize  the  problem  proposed  by  Kaminsky.  This  extension  is  important,  since  many 
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Figure  28-  Kaminsky’s  Block  Problem  for  Coarse  Destination  Node  ’ J 


of  the  problems  -  a  large  majority  in  real-life  •  that  can  be  formulated  as  transportation 
problems  have  no  physical  space  interpretation.  The  addition  of  graph-theoretic  properties 
to  the  multigrid  components  makes  his  work  attractive.  He  considers  the  proximity  of 
nodes  in  the  cost  space,  which  we  define  as  follows: 

Given  a  transportation  problem  (TP),  consider  the  set  c  of  m-tuples  (c„  c2 . cj, 

where  the  q’s  are  all  the  possible  costs  that  could  be  assigned  to  the  shipment  of  a  unit 
of  commodity  from  origins  1,  2,  ....  m,  respectively.  The  set  c  is  referred  to  as  the 
cost  space  of  the  transportation  problem  TP.  We  can  also  say  that  TP  is  defined  on  the 
cost  space  c.  The  cost  space  is  an  m-dimensional  euclidean  vector  space.  For  practical 
purposes,  sometimes  we  restrict  the  c,’s  to  be  integers  and,  sometimes,  to  nonnegative 
numbers.  For  each  particular  transportation  problem  the  set  of  arc  costs  is  a  finite  subset 
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of  its  cost  space.  Each  destination  node  j  has  an  associated  point  in  the  cost  space 
defined  as  (c1( ,  c*,, ....  c^.  The  k"*  coordinate  of  destination  node  j  is  the  the  cost  c*,  of 
shipment  to  it  from  supply  node  k. 

Cavanaugh,  like  Kaminsky,  uses  aggregation  of  nodes  (in  the  cost  space)  to  design 
the  problems  associated  to  the  coarser  grids.  Also  as  in  the  previous  work,  there  is  no 
limit  on  the  capacity  of  the  arcs  defining  the  associated  network.  The  way  nodes  are 
selected  to  be  aggregated  is  as  follows:  the  demand  nodes  are  first  sorted  by  increasing 
cost  of  shipping  from  supply  node  1 ,  and  divided  into  two  groups  about  the  median  of  the 
sorted  cost  (Figure  29(a)).  Each  of  the  resulting  groups  are  then  subdivided  into  two 
groups,  the  criterion  now  being  the  cost  of  shipping  from  supply  node  2  (Figure  29(b)), 
and  so  on.  For  simplicity.  Figure  29  represents  a  problem  with  only  two  supply  nodes. 
Each  group  created  in  this  way  is  then  sorted  by  the  same  method,  generating  smaller 
groups.  The  process  is  repeated  until  groups  of  size  2  are  all  that  remains. 

The  interpolation  process  solves  similar  problems  as  those  mentioned  in  Kaminsky’s 
thesis.  But  Cavanaugh  makes  use  of  the  tree  structure  of  basic  solutions  and  the  special 
characteristics  of  the  transportation  problem  to  speed  up  the  calculations.  Also  he 
introduces  the  use  of  reduced  costs  as  a  measure  of  optimality. 

The  local  relaxation  process  is  by  cycle-removal.  When  interpolation  is  performed, 
a  set  of  local  problems  is  solved.  These  problems  produce  locally  optimal  solutions.  Since 
all  the  local  problems  share  the  same  set  of  supply  nodes,  when  constructing  a  global 
solution  from  the  local  solutions  a  certain  overlapping  takes  place.  This  causes  the  graph 
of  the  global  solution  to  have  cycles.  Cycles  must  be  eliminated  for  the  solution  to  be 
optimal.  To  check  for  cycles,  two  adjacent  Mx2  local  problems  are  chosen  and  the  flows 
combined  to  form  a  solution  to  an  Mx4  "regional”  problem.  Cycles  are  sought  and 
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Figure  29.-  Cavanaugh ’s  coarsening  of  Demand  Nodes 


eliminated.  This  produces  a  local  relaxation  scheme.  The  mechanism  for  the  cycle 
removal  is  inspired  by  the  one  used  in  GNET  (Bradley  etal,  1977).  Cavanaugh’s  multilevel 
algorithm  performed  well  on  problems  with  only  two  or  three  supply  nodes.  His  solutions 
on  problems  consisting  of  3  supply  nodes  and  1024  demand  nodes  are  8.41%  above 
optimality  using  cycle  removal.  For  5  supply  nodes  and  equivalent  problem  size,  the 
results  are  at  least  58%  above  optimality,  which  cannot  be  considered  very  promising. 

Cornett  (1993)  continues  the  research  on  the  long  transportation  problem.  Node 
aggregation  in  cost  space  is  the  selected  mechanism  for  restriction.  In  Cornett’s  work, 
regular  grids  in  cost  space  are  considered  for  a  problem  consisting  of  three  supply  nodes. 
With  this  scheme,  nodes  are  grouped  together  with  other  nodes  lying  close  each  other  in 
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absolute  terms.  This  way,  the  influence  of  local  variations  is  more  ‘controlled*,  in  an 
attempt  to  restrict  the  effect  of  variations  to  the  dose  neighborhood.  Multigrid  methods 
tend  to  perform  well  in  such  conditions.  In  the  regular  grids  presented  by  Cornett,  once 
the  mesh  size  is  determined  for  a  grid,  the  cost  space  is  partitioned  in  elementary 
polytopes,  determined  by  intersections  of  the  coordinate  planes.  The  points  of  the  grid  are 
the  vertices  of  the  elementary  cells.  The  cells  surrounding  each  grid  point  define  an 
elementary  cell  in  the  next  coarser  grid,  and  so  on.  Notice  that  the  cost  space  is  also 
partitioned  in  this  coarser  grid  (no  overlapping  occurs),  so  not  all  the  grid  points  are 
centers  of  ceils  in  the  next  grid.  Figure  30  represents  a  regular  grid,  formed  by  the  small 
circles.  In  that  figure,  two  filled  and  larger  circles  represent  points  in  the  next  coarser  grid. 
Notice  in  the  figure  that  grid  points  located  on  each  side  of  a  cell  are  shared  by  two 
consecutive  cells.  Also  those  located  on  each  edge  are  shared  by  four,  and  those  on  each 
vertex  are  shared  by  eight  cells. 

The  restriction  operation  defines  the  demand  of  a  point  in  the  coarser  grid  by  the 
weighted  average  of  the  corresponding  ce/Ademands  (denoting  by  cell  to  the  set  of  points 
included  in  the  elementary  cube  centered  at  each  coarse  point). 

The  interpolation  process  divides  the  flow  on  a  coarser  node  “proportionally"  among 
its  corresponding  cell  demands  in  the  finer  grid.  The  flow  coming  from  each  of  the  supply 
nodes  is  considered  as  the  set  of  supplies,  so  defining  a  coarser  node  subproblem.  The 
global  solution  is  then  obtained  combining  the  solutions  obtained  from  these  local 
subproblems. 

The  relaxation  procedure  used  in  this  approach  consists  of  a  check  that  every 
demand  node  has  been  served  in  the  cheapest  possible  way.  For  those  flows  not  passing 
this  test,  a  pair  of  temporary  supply-demand  nodes  are  created.  The  supply-demand 
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values  of  these  pair  of  nodes  corresponds  exactly  to  the  flow  that  has  been  determined 
to  be  non-adequately  distributed.  Each  created  pair  is  then  associated  with  a  flow  in  the 
current  global  solution.  This  flow  is  removed  from  the  current  solution.  When  all  such  pairs 
of  temporary  nodes  have  been  created,  the  complete  set  of  temporary  nodes,  and  their 
corresponding  supply  and  demand  values,  define  the  flow  that,  in  the  current  solution,  is 
considered  to  be  not-optimally  allocated.  A  new  transportation  problem  involving  only 
those  flows  is  created  and  solved,  and  its  solution  combined  with  the  part  of  the  flow 
optimally  distributed,  so  obtaining  a  new  current  solution  in  the  relaxation  process. 
Actually,  a  maximum  of  only  one  node  is  created  for  each  of  the  existing  nodes,  and  the 
supply  (demand)  of  that  node  increased  by  the  value  of  each  non-optimal  flow.  Therefore, 
at  each  step,  the  original  problem  is  divided  into  two  parallel  problems,  one  containing  the 
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part  of  the  flow  vector  considered  ‘optimal*,  and  the  other  containing  that  part  of  the  flow 
vector  considered  ‘not  optimal*.  As  a  result,  the  relaxation  process  improves  the  flow  at 
each  iteration. 

By  using  the  above  scheme,  Cornett  designed  a  V-cycle  type  algorithm  that  solved 
problems  with  3  supply  nodes  and  up  to  29,791  demand  nodes  within  4%  of  optimality, 
using  up  to  5  grid  levels  in  the  design  scheme. 

E.  RESEARCH  OBSERVATIONS 

At  this  point,  it  is  useful  to  reflect  on  the  research  process  in  order  to  extract  some 
positive  lessons  and  to  point  out  other  possible  causes  of  problems.  This  is  in  fact  one  of 
the  objectives  of  the  present  work. 

Two  facts  deserve  special  attention  when  studying  the  works  described  above.  They 
will  be  considered  independently. 

1.  Node  Aggregation 

Node  aggregation  has  been  viewed  as  the  natural  way  of  restriction.  It 
presents  the  following  advantages:  (a)  the  coarsened  problem  is  the  same  type  of 
problem,  and  (b)  has  fewer  nodes.  The  first  property  is  typical  of  multigrid  contexts.  And 
the  second,  due  to  the  combinatorial  nature  of  optimization  problems,  reduces  effectively 
the  computational  effort  needed  to  solve  the  coarsened  problem. 

For  node  aggregation  to  be  suitable  for  multigrid  use,  it  should  maintain  the 
general  property  of  multigrid:  local  effects  must  propagate  weakly.  In  this  sense, 
aggregation  of  nodes  causes  two  conflicting  effects.  On  one  side,  grouping  of  neighbor 
nodes  maintains  a  hierarchical  structure  when  going  from  grid  to  grid.  On  the  other  side, 
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the  way  nodes  are  grouped  may  reinforce  global  influence.  This  is  illustrated  in  the 
following  example. 


Consider  a  simple  transportation  problem  with  two  supplies  and  eight  demands 
(Figure  31 ).  When  grouping  nodes  following  the  rule  used  by  Cavanaugh  one  would  arrive 
at  the  groups  depicted  in  (a),  in  opposition  to  the  more  natural  groupings  of  (b). 

Kaminsky  uses  two  different  rules  for  node  aggregation.  In  one  algorithm,  the 
total  area  of  the  problem  is  divided  into  two  "sub-blocks*  by  a  line  in  the  direction  of  the 
x-axis.  Each  one  of  these  sub-blocks  is  then  divided  by  a  line  in  the  direction  of  the  y-axis, 
and  so  on.  This  could  also  lead  to  situations  parallel  to  that  depicted  in  Figure  31 ,  but  in 
the  physical  space. 

A  different  approach  is  proposed  by  Kaminsky  in  his  second  algorithm.  Here 
a  "bottom-up"  blocking  is  achieved  by  constructing  a  graph  whose  vertices  are  the 
destinations  in  the  fine  grid.  Each  destination  is  connected  to  the  nearest  four  destinations 
in  space.  Then  the  destinations  are  blocked  by  finding  a  maximum  cardinality  matching. 
There  is  nothing  that  prevents  extreme  matchings  like  that  in  Figure  32(a),  hiding  the 
natural  structure  of  the  problem,  shown  in  Figure  32(b). 

2.  Regular  Grids 

Regular  grids  appear  to  better  deal  with  the  circumstances  described  in  the 
previous  paragraphs.  But  they  also  present  some  disadvantages. 

Setting  the  long  transportation  problem  on  a  regular  grid  tends  to  preclude  the 
pairing  of  distant  nodes  (in  cost  space).  Furthermore,  the  local  effects  affecting  the  state 
of  the  problem  tend  to  propagate  in  a  more  "controlled"  way,  since  the  area  of  immediate 
influence  of  a  node  consists  of  a  group  of  nodes  which  do  not  necessarily  exist  in  the 


108 


Figure  31.-  Two  different  Aggregations  of  Nodes  in  Cost  Space. 

actual  problem.  It  could  be  said  that  regularity  is  used  in  an  attempt  to  force  predictable 

behavior. 

We  can  describe  the  effect  of  a  regular  grid  in  an  intuitive  manner  observing 
Figure  33.  The  original  problem  (a)  is  shifted  to  a  regular  grid  (b)  that  is  finer  than  the 
original  setting  (which  now  becomes  a  coarse  grid).  As  a  consequence,  the  gaps  of  the 
flow  vector  x  in  (a)  are  regularly  filled  with  intermediate  flows  in  (b),  making  the  vector  x’s 
costs  "appear"  as  a  sampling  of  cost  locations  of  a  more  populated  cost  space. 

In  normal  sampling  (the  PDE  example),  the  magnitudes  of  components  of  the 
x  vector  tend  to  stay  around  the  corresponding  values  in  the  continuous  domain.  On  the 
other  hand,  in  the  case  just  described  above  concerning  regular  grids,  there  is  a  need  to 
combine  neighbor  values  to  obtain  the  value  of  the  "sample"  in  the  next  coarser  grid.  One 
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Figure  32.-  " Bucket "  Aggregation  of  Nodes  (Kaminsky). 

can  imagine  T  e  difficulty  of  devising  a  sampling  process  (in  optimization)  such  that  the 
values  of  flows  in  successive  grid  representations  tend  to  converge,  when  performing 
node  aggregation  in  the  terms  of  the  previous  approaches. 

Since  the  dimensionality  of  the  problem  equals  the  number  of  supply  nodes, 
we  can  imagine  that  the  rapid  growth  in  size  of  the  problem,  already  pointed  out 
(Cavanaugh  etal,  1992)  for  irregular  grids,  will  be  much  more  important  in  the  regular  grid 
case.  This  limits  the  practical  field  of  application  of  regular  grids  to  problems  with  only  a 
few  supply  nodes. 


This  final  discussion  connects  some  of  the  optimization  tools  presented  in  the 
previous  chapters  with  the  work  that  has  been  developed  towards  the  implementation  of 
multigrid  methods  in  optimization.  The  conclusions  are  an  attempt  to  compile  the  limited 
experience  in  applying  multigrid  ideas  in  optimization  with  the  specific  optimization 
techniques  that  could  play  a  relevant  role  when  designing  multigrid  components. 

Three  aspects  will  be  considered.  First  is  the  general  question  of  whether  multigrid 
should  be  applied  to  optimization,  or  vice-versa.  Second,  applications  of  multigrid 
approaches  to  optimization  problems  will  be  suggested.  Third,  the  various  optimization 
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techniques  presented  in  previous  chapters  are  connected  to  the  multigrid  process,  with 
some  insights  and  recommendations  for  future  research. 

1 .  Multigrid  towards  Optimization  versus  Optimization  towards  Multigrtd 

Throughout  the  present  work,  this  question  is  a  constant:  (a)  should  muitigrid 

be  used  as  a  general  way  to  approach  problems,  transferring  the  idea  to  the  solution  of 
optimization  problems,  or  (b)  should  optimization  techniques  be  directly  transferred  to  the 
multigrid  scheme  and  so  build  a  muitigrid  scheme  with  optimization  blocks. 

The  conclusion  of  this  thesis  is  that  the  multigrid  method  is  more  a  philosophy 
of  attacking  problems  than  a  specific  solution  technique.  Multigrid  concepts  can  be  applied 
in  optimization  to  develop  a  strategy  for  approaching  problems.  But  not  all  the  components 
of  a  multigrid  scheme  are  directly  transferable  to  optimization,  where  problems,  in  general, 
are  structured  so  that  local  variations  typically  propagate  widely  into  the  global  problem. 
Nevertheless,  techniques  and  procedures  of  optimization  can  be  used  to  handle  some  of 
the  troublesome  properties  that  these  problems  present  to  a  multigrid  approach.  This  is 
covered  in  the  following  points. 

2.  Field  of  application 

The  previous  point  leads  to  the  consideration  of  how  general  an  approach 
multigrid  is  for  optimization  problems.  The  study  of  previous  work  in  section  D,  led  to  the 
observations  in  section  E,  from  which  it  follows  that  aggregation  of  nodes  is  a  natural 
process  to  be  used  as  a  restriction  operation.  The  experience,  however,  is  not  as 
promising  as  it  appears  at  first  glance.  The  associated  difficulty  -  explained  in  section  E.1 
suggest  that  only  problems  with  a  suitable  structure  are  amenable  to  node  aggregation 
as  a  restriction  operation.  Problems  structured  so  that  clusters  of  nodes  can  be  identified 
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are  specially  oriented  to  a  multigrid  approach.  So,  cluttfing-typ*  of  layouts  are 
expected  to  produce  better  results.  For  example,  in  cost  space,  the  problem  represented 
in  Figure  34(a)  has  two  identifiable  clusters  of  nodes,  with  each  one  formed  by  two  pairs, 
that  can  be  thought  of  as  clusters  at  a  different  level.  That  kind  of  layout  is  expected  to 
be  more  successfully  solved  in  a  multigrid  fashion  that  the  spread  layout  in  Figure  34(b). 
Often,  when  experimenting  with  algorithms,  one  tends  to  randomly  construct  problems. 
This  methodology  is  more  likely  to  produce  designs  like  Figure  34(b).  The  conclusion  here 
is  that  experiments  must  be  designed  carefully  when  node  aggregation  is  the  method 
chosen  for  restriction.  The  number  of  grids  that  can  be  implemented  is  determined  by  the 
separation  between  clusters.  In  general,  the  distance  between  two  members  of  a  cluster 
must  be  small  compared  to  the  distance  between  two  clusters.  Furthermore,  a  new  level 
with  its  corresponding  grid  should  be  introduced  only  if  the  clusters  can  be  partitioned  into 
groups,  such  that  the  distance  between  groups  of  clusters  is  large  when  compared  to  the 
distance  between  individual  clusters  within  a  group.  This  process,  when  iterated, 
determines  the  maximum  number  of  grids  that  a  problem  can  have.  Therefore,  the 
maximum  number  of  grids  that  can  be  introduced  is  determined  by  each  particular 
instance  of  a  problem. 

A  characteristic  to  examine  in  order  to  determine  if  a  problem  is  suitable  for 
multigrid  approach  is  whether  there  exists  a  weak  relation  between  points  of  the  grid 
located  far  apart.  This  can  be  expressed,  for  instance,  in  terms  of  a  function  assigning 
costs  or  other  problem  parameters  according  to  a  rule.  An  example  is  the  case  of 
Kaminsky’s  geometric  transportation  problem.  In  general,  it  is  desirable  that  a  mapping 
could  be  found  so  that  some  of  the  main  defining  characteristics  of  "local"  points  of  the 
problem  could  be  gathered  together  and  be  given  in  the  form  of  a  function.  Kaminsky's 
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Figure  34.-  Clustered  versus  Spread  Structure  of  Problems. 


geometric  transportation  problem,  for  example,  defines  a  map  between  locations  in  tha 
space  and  costs.  With  that  type  of  problem  design  it  is  possible  to  check  for  a  cluster 
structure  favoring  multigrid,  and  determine  the  depth  in  levels  that  can  be  expected. 

Demands  play  an  important  role  when  aggregating.  It  is  conceivable  to 
increment  the  dimension  of  the  cost  space  or  the  physical  space  to  account  for  the 
demands  of  (destination)  nodes.  Another  property  that  could  be  considered  for 
aggregation  is  the  likelihood  that  a  demand  node  be  supplied  by  a  given  source.  This  can 
be  defined  in  terms  of  available  supply,  required  demand  and  reduced  cost.  In  general, 
it  is  unlikely  that  a  perfect  mapping  could  be  found  defining  a  preference  ordering.  Finding 
such  mapping  is  probably  as  difficult  as  solving  the  problem  itself.  Occasionally,  a  few 
properties  or  parameters  have  a  stronger  influence  than  the  rest  in  determining  supply 
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policies,  and  it  is  in  such  cases  when  the  structure  of  the  problem  can  be  expected  to 
support  a  multigrid  approach. 


The  uncapacitated  long  transportation  problem  possesses  some  individual 
characteristics  that  make  it  suitable  for  a  multigrid  approach.  In  Chapter  III.C  we 
mentioned  the  fact  that  an  optimal  extreme  point  solution  to  a  transportation  problem  has 
at  most  m-1  demand  nodes  that  will  be  supplied  by  at  least  two  supply-nodes,  while  the 
remaining  at  least  n-m-t-1  demand  nodes  are  supplied  by  only  one  supply-node.  The 
implication  is  that,  since  n  is  much  larger  than  m,  the  number  of  destinations  served  by 
only  one  origin  is  relatively  high. 

3.  Unconventional  Grids  -  Decomposition 

The  difficulty  with  regular  grids  pointed  out  in  section  E.2  raises  the  question 
of  a  less  conventional  interpretation  of  the  classical  concept  of  grid  for  optimization 
problems. 

In  his  article  "Levels  and  Scales”,  Brandt(1985)  gives  the  following  illustration 
of  the  favorable  characteristics  of  problems  where  multigrid  succeeds.  A  two-level 
hierarchical  structure  should  operate  in  the  following  way.  The  global  government  first 
gathers  some  general  data  summarized  at  the  local  level,  representing  the  sum  totals  of 
local  needs,  important  overall  constraints,  etc.  Based  on  these  it  prepares  preliminary 
global  plans.  These  global  plans  give  the  local  governments  the  framework  for  devising 
their  own,  more  detailed  plans.  These  global  plans  do  not  quite  fit  the  local  situation  and, 
therefore,  need  some  adjustments  or  corrections.  So,  in  a  second  round,  the  global 
government  again  gathers  information  summarized  at  the  local  level,  now  representing 
sum  totals  of  needed  corrections.  Since  in  practice  this  process  is  seldom  fully 
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recognized,  let  alone  fully  effectively  organized,  more  such  rounds  may  be  needed.  When 
more  levels  of  government  are  involved,  the  process  is  applied  recursively,  in  a  variety  of 
manners. 

The  ideal  operation  of  the  two-level  hierarchical  government  is  fully  analogous 
to  a  two-grid  process.  The  problem  is  first  represented  on  the  coarser  grid  (e.g.,  by 
averaging  its  equations  to  the  scale  of  that  grid).  The  (approximate)  solution  to  the 
resulting  coarse-grid  problem,  once  computed,  is  then  interpolated  to  the  fine  grid,  serving 
there  as  first  approximation,  a  framework,  to  be  next  improved  by  fine-grid  processes, 
such  as  relaxation.  This  fine  grid  processing  finds  the  fine  features  of  the  solution  which 
were  invisible  to  the  coarser  grid,  and  also,  as  a  result,  encounters  some  residuals  of 
global  (smooth)  errors,  which  it  cannot  effectively  reduce.  So,  in  the  next  round,  the 
residual  problem  is  approximately  transferred,  by  some  averaging,  to  the  coarse  grid, 
where  it  can  be  efficiently  solved  and  its  solution  is  then  interpolated  back  to  the  fine  grid 
and  added  as  a  correction  to  the  previous  fine-grid  solution.  This  is  a  two-level  full 
multigrid  (FMG).  algorithm. 

At  this  point,  let  us  analyze  and  compare  the  two-level  work  of  a 
decomposition  procedure,  like  those  described  in  Chapter  IV.  Consider  a  large  system  that 
is  composed  of  smaller  subsystems  1 ,  2, ....  T.  Each  subsystem  i  has  its  own  objective, 
and  the  objective  function  of  the  overall  system  is  the  sum  of  the  objective  functions  of  the 
subsystems.  Each  subsystem  has  its  constraints  designated  by  the  set  X,  ,  which  is 
assumed  to  be  bounded.  In  addition,  all  the  subsystems  share  a  few  common  resources, 
and  hence  the  consumption  of  these  resources  by  all  the  subsystems  must  not  exceed 
the  availability  given  by  the  resource  vector  b. 


116 


Recall  that  the  economic  interpretation  of  the  dual  variables  (Lagrangian 
multipliers,  w)  represents  the  rate  of  change  of  the  objective  as  a  function  of  b, .  Hence  - 
w,  can  be  thought  of  as  the  price  of  consuming  one  unit  of  the  i*  common  resource.  With 
this  in  mind,  the  decomposition  algorithm  can  be  interpreted  as  follows  (Bazaraa  et  a/, 
1990).  With  the  current  proposals  of  the  subsystems,  the  superordinate  (total  system) 
obtains  the  optimal  weights  of  these  proposals  and  announces  a  set  of  prices  for  using 
the  common  resources.  These  prices  are  passed  down  to  the  subsystems,  which  modify 
their  proposals  according  to  these  new  prices.  A  typical  subsystem  i  solves  the  following 
subproblem: 

Maximize  (wA,  -  c,)  x,  +  a, 

Subject  to  x,  in  the  set  X 

or  equivalently 

Minimize  (c,  -  wA;)  x,  -  a, 

Subject  to  x,  in  the  set  X 

The  original  objective  of  the  system  i  is  c,  x, .  Note  that  A;  x,  is  the  amount 
of  common  resources  consumed  by  the  im  proposal.  Since  the  price  of  using  these 
resources  is  -w,  then  the  indirect  cost  of  using  them  is  -wA*  x, ,  and  the  total  cost  is  (c- 
wAjjXj.  Note  that  the  term  -wA,  x,  makes  proposals  that  use  much  of  the  common 
resources  unattractive  from  a  cost  point  of  view.  The  mechanism  is  the  following. 
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Subsystem  i  announces  an  optimal  proposal  x* .  If  this  proposal  is  to  be  considered,  then 
the  weight  of  the  older  proposals  x,  must  decrease  to  "make  room"  for  this  proposal;  that 
is  1^,  must  decrease  from  its  present  level  of  1 .  The  resulting  saving  is  precisely  o, .  If 
the  cost  of  introducing  the  proposal  is  less  than  the  saving  realized,  then  the 
superordinate  would  consider  this  new  proposal.  After  all  the  subsystems  introduce  their 
new  proposals,  the  superordinate  calculates  the  optimum  mix  of  these  proposals  and 
passes  down  the  new  prices.  The  process  is  repeated  until  none  of  the  subsystems  has 
a  new  attractive  proposal;  that  is,  when  (ci  -  wAt)  xlk  -  aA  z  o  for  each  i. 

The  case  of  decomposition  resembles,  although  is  not  exact,  the  case  of  the 
two-level  hierarchical  structure  described  above.  The  process  of  decomposition, 
conveniently  helped  by  a  mechanism  to  discover  'easy*  sets  of  constraints,  gives  a 
method  to  detect  hierarchical  structures  in  problems  where  they  are  not  initially  explicit. 
Here  the  conventional  concept  of  grids  is  replaced  by  the  more  general  concept  of  level, 
in  order  to  facilitate  a  mapping  of  the  problem  into  a  somewhat  hierarchical  structure. 

4.  Unconventional  Grids  -  Scaling 

We  can  consider  scaling  as  a  multilevel  technique.  Here  we  will  establish  the 
parallels  between  the  scaling  techniques  and  the  multigrid  approach.  Recall,  for  example, 
cost  scaling.  Here,  the  basic  mechanism  is  work  on  a  state  of  e/2-optimality  for  each  value 
of  the  scaling  parameter  e. 

Recall  (Brandt,  1985)  that  the  discretization  error,  i.e.,  the  difference  between 
the  true  solution  of  the  differential  problem  (in  the  PDE  case)  and  the  exact  solution  of  the 
discretized  equations,  has  a  relative  magnitude  clearly  determined  by  the  relative 
magnitudes  of  the  discretization  mesh  size  and  the  solution  scale.  Thus,  exactly  those 
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errors  that  are  e tow  to  converge  by  relaxation  processes  on  some  grid  can  be 
approximated  on  a  coarser  grid,  where  the  mesh  size  is  comparable  to  their  scale  and 
hence  their  convergence  need  not  be  slow. 

In  the  scaling  procedures,  i.e.,  cost  scaling,  the  scaling  parameter  e  imposes 
a  limitation  "by  design"  in  the  obtainable  solution  similar  to  the  case  of  mesh  size  h  in  the 
conventional  grids.  So  we  can  consider  e  to  represent  a  grid. 

For  a  given  e,  initially  very  coarse,  there  is  a  series  of  pushes  and  relabelling 
steps  to  convert  the  initial  1/2e-optimal  pseudoflow  into  a  l/2e*optimal  flow.  Each  step 
uses  as  its  initial  state  the  final  state  of  the  previous  step.  The  result  is  an  "image"  of  a 
relaxation  mechanism,  which  is  performed  by  successive  local  operations. 

When  the  status  of  1/2e-optimality  is  reached,  the  scaling  parameter  e  is 
halved.  The  state  of  the  problem,  defined  in  this  case  by  the  values  of  the  node  potentials 
and  reduced  costs  of  the  arcs,  is  transferred  to  the  new  grid  defined  by  the  halved  value 
of  the  scaling  parameter,  now  requiring  a  higher  degree  of  refinement  in  the  process  of 
obtaining  the  new  approximate  solution. 

Scaling  suggests  a  more  abstract  concept  of  a  grid,  represented  in  the  scaling 
parameter.  With  the  parallel  operations  described  above,  cost  scaling  could  be  considered 
to  be  an  interpretation  of  a  multilevel  V-cycle,  in  which  the  path  from  the  original  to  the 
coarsest  grid  is  done  in  a  straight  step,  with  the  return  being  traced  through  all  the 
intermediate  "grids"  defined  by  the  successively  halved  scaling  parameter. 

It  is  conceivable,  in  order  to  let  the  coarse  correction  scheme  come  into  play, 
to  establish  a  schematic  method  similar  to  the  following: 

Relax  in  the  finest  grid  (e  =  1)  a  few  times.  Let  the  achieved  solution  be  x(1>. 

Double  the  scaling  parameter  to  move  into  the  next  coarser  grid  2e. 
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The  vector  x(,)  now  has  to  be  converted  into  a  pseudoflow  (restriction).  Call 
the  resulting  pseudoflow  x(2).  The  process  of  obtaining  a  flow  in  this  level  of 
optimality  is  equivalent  to  the  solution  of  the  problem  at  this  level.  That  could 
be  done  by  initiating  a  new  recursive  step,  so  that  the  problem  at  this  point  is 
transferred  to  the  4c  grid 

The  solution  at  the  2c  level  can  be  used  as  a  starting  improved  point  to  relax 
on  the  finest  level. 

A  relaxation  scheme  to  apply  would  consist  of  the  push-relabel  operation 
described  in  Chapter  VI.  Node  potentials  and  reduced  costs  are  essential  elements  to 
update  along  the  grids  in  the  various  steps. 

G.  SUMMARY  FOR  FURTHER  PESEARCH 

These  points  underline  the  need  for  investigation  of  multigrid  approaches  with  other 
than  the  conventional  grids.  The  concept  of  sampling  the  state  of  the  problem  can  be 
extended  to  "expressing  the  state  of  the  problem",  which  does  not  necessarily  involve  grid 
points,  but  simply  levels.  Mechanisms  like  decomposition  or  scaling  are  good  for  handling 
properties  such  as  arc  capacities,  to  be  considered  throughout  the  multigrid  development 
process  in  optimization.  Working  on  the  residual  network  has  not  been  considered  in 
multigrid.  Integrated  in  a  multigrid-type  algorithm,  we  can  apply  these  techniques  to 
perform  relaxation  at  one  level  of  coarsening.  The  scaling-parameter  provides  a  measure 
of  tracking  how  fine  the  relaxation  is.  Furthermore,  it  can  be  used  to  determine  a  sufficient 
degree  of  relaxation  in  the  level.  The  idea  of  pseudoflows  that  the  scaling  methods 
develop  is  an  interesting  way  to  force  problems  to  react  in  a  controlled  way  to  local 
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in  capacity-scaling.  All  these  concepts  look  promising  when  integrated  in  a  multigrid 
scheme. 

Also,  the  work  on  residual  networks  provides  the  checks  for  optimality  (see  Chapter 
II),  that  have  been  systematically  used  in  network  problems.  They  involve  a  follow-up  of 
node  potentials  and  reduced  costs.  In  the  transportation  problem  this  is  very  cheap  to 
compute. 

These  conclusions  should  help  in  opening  new  ways  to  approach  the  multigrid 
design  of  algorithms  or  improve  specific  areas  of  those  already  existing.  Multigrid  concepts 
of  restriction,  interpolation,  and  the  concept  of  "grid*  have  been  moved  to  a  more  abstract 
area,  likely  to  be  productive  in  the  development  of  new  ideas  in  this  field.  New 
optimization  tools  that  engage  and  fit  in  the  multigrid  philosophy  have  been  presented. 
This  opens  up  a  broad  field  of  study  and  experimentation  for  future  researchers. 
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APPENDIX  A 


A.  DANTZH3-W0LFE  DECOMPOSITION  -  NUMERICAL  EXAMPLE 

Consider  the  following  problem 


min 

S.T. 


-  2  x,  -  Xj  -  x3  +  x4 

x1  +  x3  £  2 

x,  +  Xj  +  2  x4  £  3 


Xi 

x,  +  2  Xj 


x 


-  X3  +  x4 

2  x3  +  x4 

Xj,  x*  x4 


<  2 
£  5 
£  2 
<;  6 
>  o 


and  let  the  set  X  consist  of  the  last  five  constraints.  Then 


Denote  x0*  ,  j  =  1,2,...,  t  the  corner  points  of  X,  and  s  the  slack  vector.  The  problem  is 
reformulated  as  follows: 
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min 


s.t. 


c 


g  (Ax^Uj  *  s  =  b 


£>3  *  1  Ai  *  O' 


S  2.  0 


To  start,  we  need  a  first  extreme  point  of  X.  Choose  x(1)  =  (0,  0,  0,  0).  Let  the  starting 
basis  consist  of  s  and  X,  .  Note  that,  at  this  point,  the  vector  of  duals,  (w,  a)  is  the  zero 
vector.  We  will  denote  b  as  the  updated  right  hand  side  in  the  simplex  tableau.  Recall  that 


E 


with  B 1  being  the  inverse  of  the  basic  submatrix  B  (initially,  a  3x3  identity  matrix).  The 
initial  simplex  tableau  will  adopt  the  form 


BASIS  INVERSE 

RHS 

z 

0 

0 

0 

0 

s. 

1 

0 

0 

0 

S2 

0 

1 

0 

2 

0 

0 

1 

1 

Also  recall  that  cx(1>  =  0. 
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Suboroblem 

The  subproblem  is  stated  as 

max  (w  A  -  c)x  +  a 
s.t.  x  e  X 

which,  in  terms  of  the  problem  data,  is  expressed 
max  2  x,  +  Xg  +  x3  -  x4 


Xi 

<  2 

>r 

CM 

+ 

x" 

<;  5 

-Xa  + 

X4 

<  2 

+ 

>f 

C\l 

x« 

<  6 

X,,  Xg, 

X3. 

x4 

£  0 

The  solution  to  this  problem  is  (2,  3/2,  3,  0),  with  objective  function  17/2, 
representing  the  most  favourably  priced  out  of  the  columns  corresponding  to  X’s  in  the 
simplex  tableau.  The  lower  bound  for  the  objective  function  is  -17/2.  (Note:  lower  bounds 
are  calculated  as  the  current  best  overall  objective  function  value,  minus  the  solution  to 
the  last  subproblem,  since  it  gives  the  maximum  of  the  price-outs). 


Master  Step 

The  entering  column  Cw  is  designed  to  be  A,,  and  is  obtained  in  the  standard  form 


cl*> 


price-out 

y  2 


where 


y2 


5 

1_ 

2 

1 
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so 


c(*> 


17 

2 

5 

2 

2 

1 


After  pivoting,  we  obtain  the  following  tableau 


BASIS  INVERSE 

KBi 

2 

-17/10 

0 

0 

-17/5 

1/5 

0 

0 

2/5 

s2 

-7/10 

1 

0 

8/5 

-1/5 

0 

1 

3/5 

So  far,  the  best-known  feasible  solution  to  the  overall  problem  is  given  by 
x  =  ^,x,  +  =  (4/5,  3/5,  6/5,  0) 

and  the  objective  function  value  is  -17/5.  Also,  the  vector  of  duals  is  (-17/10, 0, 0),  as  can 
be  seen  on  the  first  row  of  the  tableau.  Here  the  first  iteration  terminates. 

Suboroblem 

Proceeding  as  in  the  first  iteration,  the  next  subproblem  is  expressed  by 

max  3/10  x,  +  Xg  -  7/10  x3  -  x4 
s.t.  x  e  X  (for  sake  of  brevity) 

giving  x(3)  =  (0,  5/2,  0,  0),  with  objective  function  value  5/2.  X3  is  introduced.  The  lower 
bound  now  is  -17/5  -  5/2  =  -59/10  =  -5.9. 
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Mwtor  9toD 

The  entering  column  is  obtained  in  similar  fashion  as  the  first  case 


y3 


S'1 


Ax, 


-00 

5 

0 

0 

1  0 

5 

S 

5 

10 

2 

2 

-—0  1 

1. 

1. 

5 

The  new  tableau  looks  as 


BASIS  INVERSE 

RHS 

Z 

-6/5 

0 

-5/2 

-49/10 

K 

1/5 

0 

0 

2/5 

S2 

-1/5 

1 

-5/2 

1/10 

^3 

-1/5 

0 

1 

3/5 

Now  the  best-known  feasible  solution  is  x  =  XgX*21  +  XgX^  =  (4/5, 21/10, 6/5,  0),  and  the 
objective  is  -49/10. 


Suborotlem 

max  4/5  x,  +  Xj  -  1/5  x3  -  x4  -  5/2 

s.t.  x  e  X 

which  solved  yields  x<4)  =  (2, 3/2, 0, 0),  with  objective  value  3/5.  So  X.4  is  introduced.  The 
new  lower  bound  is  given  by  -49/10  -  3/5  =  *11/2  =  -5.5. 
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IhMsisuise 

Proceeding  as  above,  the  following  tableau  is  obtained 


BASIS  INVERSE 

RHS 

z 

-1 

-1 

0 

•5 

K 

1/3 

-2/3 

5/3 

1/3 

K 

-1/3 

5/3 

-25/6 

1/6 

*3 

0 

-1 

7/2 

1/2 

the  best-known  solution  is  x  =  XjX12’  +  AgX<3)  +  X4x(4>  =  (1,2,  1,0). 


Subproblem 

max  Ox,  +  0  Xj  •  0  Xg  -  3  x4 
s.t.  x  e  X 

with  solution  x(5)  =  (0,  0,  0,  0)  and  objective  0,  which  is  the  termination  criterion.  Also 
note  that  the  lower  bound  is  -5  -  0  =  -5,  equal  to  the  last  best  objective  value  (therefore 
optimal).  The  optimal  solution  is  then  given  by  x  =  (1,  2,  1,  0),  with  the  objective  value 
equal  to  -5.  It  is  interesting  to  plot  the  successive  best  objective  function  values  (bold 
letters  in  the  RHS  column)  and  the  progress  of  lower  bounds.  This  is  done  in  Figure  35. 
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Figure  35. -  Dantzig-Wolfe  Decomposition  Example.  Convergence  of  Current 


Objective  values  and  successive  Lower  Bounds  (Minimization). 
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APPENDIX  B 


A.  AGGREGATION  METHOD  -  NUMERICAL  EXAMPLE 

This  example  is  from  Balas  (1965).  The  original  example  has  been  slightly  modified. 
Figure  36  presente  the  data  of  the  original  problem.  The  tableau  is  constructed  as 
described  in  Chapter  III,  with  costs  inside  the  cells.  Rows  are  associated  with  supply 
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Figure  36.-  Balas  ’  Aggregation.  Tableau  for  the  Original  Problem. 


nodes  and  columns  with  destinations. 

In  the  figure,  dashed  costs  lines  represent  aggregation  of  nodes.  Boldface  costs 
correspond  to  the  cost  of  each  aggregated  group,  taken  to  be  the  minimum  of  each  group. 
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Supplies  and  demands  are  represented  in  the  last  two  columns  and  rows,  respectively. 
The  supplies  (demands)  corresponding  to  the  aggregated  nodes  are  placed  in  the  last 
column  (row).  They  are  obtained  adding  the  supply  (demand)  of  the  ceils  forming  the 
aggregated  node. 

A  visualization  of  the  problem  is  presented  in  Figure  37.  There,  dark  nodes  are 
destinations.  The  circles  around  a  group  represent  the  aggregated  nodes.  The  numbers 
close  to  each  arc  are  costs.  The  numbers  inside  the  small  circles  are  the  node  indices. 
The  figure  is  illustrative,  and,  although  more  than  one  path  could  be  found  between  two 
nodes,  each  path  with  different  cost,  the  reader  should  interpret  that  the  paths  giving  the 
costs  of  the  tableau  in  Figure  36  must  be  chosen. 

Problem  II  is  constructed  using  the  bold-faced  data  in  Figure  36,  along  with  the 
aggregated  de.nands/supplies.  For  brevity,  the  corresponding  tableau  is  not  included. 
Notice  that  the  indices  i,  j  of  the  aggregated  nodes,  mentioned  in  Chapter  V,  now  run 
along  the  groups  of  cells  inside  each  of  the  dashed  rectangles  in  Figure  36.  Solving 
Problem  II  the  following  solution  is  obtained  (expressed  in  matrix  form): 


x 


15  60  0  0  0  0 

15  0  35  50  0  0 

0  0  0  0  65  0 

0  0  55  0  0  35 


With  the  resulting  solution,  Problem  III  is  formed  as  follows.  Pick  the  groups  of  cells 
identified  by  the  indices  of  those  entries  in  X  (i.e.  (1,1),  (1,2),  (2,1),  (2,3),  (2,4),  (3,5),  (4,3) 
and  (4,6)).  Transfer  the  whole  groups  (original  problem)  into  a  new  tableau,  not  shown. 
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Figure  37.-  "Map"  of  Origins,  Destinations  and  their  aggregation. 
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in  which  the  rest  of  the  entries  will  be  given  a  cost  value  to  force  them  not  to  appear  in 
the  optimal  solution.  Solve  the  problem  so  constructed.  The  solution  is  presented  in  Figure 
38.  The  blank  "blocks'  correspond  to  those  entries  equal  to  zero  in  the  solution  X  to 
Problem  II.  The  last  two  columns  and  rows  now  represent  the  computed  dual  values  for 
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Figure  38.-  Balas  *  Aggregation.  Solution  to  Problem  III. 


the  individual  and  aggregated  nodes,  respectively. 


Next  the  DV  are  computed.  Recall  that  those  are  the  values  of  the  slack  variables 
for  those  blocks  not  in  the  optimal  solution  of  Problem  II.  They  are  represented  in  matrix 
form  as 
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0  0  40  0  -12  43 
0  13  0  0  -37  12 
58  109  €8  0  0  28 

57  1  0  38  -22  0 


If  all  the  entries  in  D\  were  nonnegative,  the  solution  to  Problem  III  would  be 
optimal.  Since  that  is  not  the  case,  we  need  to  compute  the  corresponding  values  of  the 
slack  for  all  those  original  cells  in  the  groups  affected  (i.e.  groups  (1 ,5),  (2,5),  (4,5)).  The 
result  of  those  computations  is  expressed,  in  compact  form 


Di  s 


34  34  34 

10  10  10 
10  10  10 


D2S 


0  -10  -10 
0  -10  -10 


13  -7  17 
13  -7  17 


Since  not  all  the  entries  in  the  above  matrices  are  nonnegative,  those  blocks  of  cells 
must  be  included  in  a  new  iteration  to  form  a  new  Problem  III.  The  associated  tableau  is 
shown  in  Figure  39.  Solving  the  tableau  in  Figure  39,  the  resulting  solution  is  optimal. 
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APPENDIX  C 


A.  MATHEMATICA  RLE  MG001  .M 

(*====sm=a=====s«s^==ssa===s=^=^=========== 

This  demonstration  file  shows  how  the  oscillatory  components  of  the 
error  are  “quickly"  eliminated  by  a  Jacobi  iteration  technique  when 
solving  a  suitable  system  of  linear  equations. 

The  system  used  here  is  the  same  system  considered  in  Briggs  (1987), 
i.e.,  that  yielded  by  the  steady-state  temperature  distribution  in  a 
long  uniform  rod: 

-  u"  +  sigma  u  =  f(x); 
u(0)  =  u(1)  =  0; 
sigma  >=  0; 

The  domain  of  the  problem  is  partitioned  into  ’NN’  subintervals.  The 
length  of  each  subintervals  is  ’IV.  The  linear  system  of  equations  has 
the  form  A  x  =  f 

For  generality,  the  right  hand  side  vector  f  is  a  random  vector  of 
integer  values. 

The  initial  value  of  the  solution  is  defined  as  a  sum  of  Fourier 
“modes",  affected  by  different  coefficients.  The  output  is  a  Table  of 
graphics  ("figure"),  each  representative  of  the  status  of  the  error  in 
the  Jacobi  process.  Applying  "ShowAnimation  [figure]"  to  the  Table, 
a  movie  representation  of  the  error  progress  is  obtained.  Limitations  irr 
the  computer  system  may  force  to  animate  only  a  certain  range  of  the 
graphics  array. 

The  residuals  ’r  =  f  -  A  x’  are  also  obtained.  A  similar  graphic 
for  them  can  be  obtained  by  slight  modification  of  the  code. 

Reference:  "A  Multigrid  Tutorial",  William  Briggs,  SIAM,  1987. 
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NN  =  48;  Nlterations  =  200;  h  =  1/NN//N;  sigma  =  5;  jump  =  3; 
A  =  ZeroMatrix[NN+1]; 

For  [i»1 ,  i<=  Length[A],  ++i,  A[{i,i]]  =  2  +  sigma  h*2]; 

For  [i*1,  i<=  Length[A]-1,  ++i,  A[[i,  i+1]]  =  A[[i+1,  i]]  =  -1]; 

A  =  A/hA2 


f  =  Table[Random[lnteger,  {-5,  9}],  {NN+1}]; 
c  =  Table[Random[Real,  {-1, 1}J,  {NN+1}]; 

TheRoots  =  LinearSolve[N[A].N[f]];  (*  True  solution  *) 

Print  ["1  = ",  ColumnForm[f]];  Print  [‘  "}; 

Print  ["TheRoots  =  ",  ColumnForm[TheRoots]];  Print  ['  “]; 


Diag  =  Table  [A[[i,i]],  {i,1,Length[A]}]; 

DD  =  DiagonalMatrix  [Diag]; 

LL  =  Table  [lf[i  >  j,  -A[[i,j]]t  0], 
{i,1,Length[A]}, 

{j.1  ;Length[A]}]; 

UU  =  Table  [lf[i  <  j,  A[[i,j]],  0], 

[i,1  ,Length[A]}, 

{j,1  ,Length[A]}]; 

DDinv  =  DiagonalMatrix  [1/Diag]; 

PP  =  Dot  [DDinv,  (LL  +  UU)]; 


(*  Diagonal  elements  of  A  *) 
(*  Diagonal  Matrix  *) 


(*  Lower  Triangular  Matrix  *) 


(*  Upper  Triangular  Matrix  *) 
(*  Inverse  of  DD  *) 
(*  Iteration  Matrix  *) 


(*  Initial  value  of  the  solution  *) 

xlnit  =  Tablet  Sum[k  Sin[j  k  Pi  /  NN//N],  {k,  1,10,2}],  {j,1, NN+1}]; 
rlnit  =  f  -  Dot[A,  xlnit]; 


Print  ["Initial  x  =  ",  ColumnForm[xlnit]];  Print  [  " "]; 
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Print  ["Initial  r  =  ",  ColumnForm[rinit]];  Print  [OutFile, " "]; 

errors  =  {TheRoots  -  xlnit);  (*  Initial  value  of  the  error  *) 

residuals  =  {rlnit}; 

x  =  xlnit;  r  =  rlnit;  (*  first  values  in  the  loop  *) 

For  [  i  =  1 ,  i  <=  Nlterations,  ++i,  (*  Iterative  loop  *) 

CompoundExpression  [ 

x  =  Dot[Dot[DDinv,  (LL  +  UU)],  x]  +  Dot[DDinv,  f]; 
r  =  f  -  Do '[A,  x]; 

residuals  =  AppendTo  [residuals,  r]; 

errors  =  AppendTo  [errors,  TheRoots  -  x]  ]; 

Print  ["Iteration  #:  *,i]; 

]; 

B.  MATHEMATICA  FILE  MG002.M 

figure  =  Table  [  ListPlot[errors[[i]], 

PlotJoined  ->  True, 

PlotRange  ->  [-2,  2}, 

PlotLabel  ->  StringForm  ["error  “*,  i], 

DisplayFunction  ->  Identity],  [i,  1 ,  Nlterations,  jump}]  ; 

fig8l  first  =  ListPlot  [TheRoots  -  xlnit, 

PlotJoined  ->  True, 

PlotRange  ->  [-8,  8}, 

PlotLabel  ->  StringForm  (“error  1 "], 

DisplayFunction  ->  Identity]; 

fig81  =  [fig81  first,  figure[[4]],  figure[[8]],  figure[[12]],  figure[[16]], 
figure[[25]],  figure[[35]],  figure[[Length[figure]]]}; 


137 


figure2  =  Table  [  ListPlot[re8kjual8[[i]], 

PlotJoined  ->  True, 

PlotRange  ->  {-400,  400}, 

PlotLabel  ->  StringForm  ["residual  i], 

DisplayFunction  ->  identity],  {i,  1,  Nlterations,  jump}]  ; 

fig82first  =  ListPlot  [f  -  Dot[A,  xlnit], 

PlotJoined  ->  True, 

PlotRange  ->  {-1500, 1500}, 

PlotLabel  ->  StringForm  ["residual  1"], 

DisplayFunction  ->  Identity]; 

fig82  =  {fig82first,  figure2[[4]],  figure2[[8]],  figure2[[12]],  figure2[[16]], 
figure2[[25]],  figure2[[35]],  figure2[[Length[figure2]]]}; 

(*  The  following  lines  instruct  Mathematics  to  show  a  picture  using 
several  representations  of  the  error  term  and  residuals.  The  last 
line  performs  a  movie-representation  of  the  error. 

Show  [GraphicsArray[Partition[figure81,  2]]] 

Show  [GraphicsArray[Partition[figure82,  2]]] 

ShowAnimation  [figure[[Range[1 ,20]]]]  *) 

C.  PROBLEM  DATA 

f  =  9  5  -2  -2  4  1  3  -5  -1  -3  3  9  -3  3  4  9  7  9  6  5  3  -4  7  -2  8 

95  -2  5  -5  2  4  8  2  9  8  5  -2  59-2 

-5  0  2  0  4  7  -1  8 

True  solution  for  ’x’: 

0.02065783515014781 

0.03745425067171521 

0.05216180823033061 

138 


0.06785061971305696 

0.0845547320198134 

0.0997062287276545 

0.1146400740221441 

0.1285206208661609 

0.1448502141964325 

0.1619281803873733 

0.1806596365531132 

0.1984810658884562 

0.2128269767035918 

0.2289368349508098 

0.2442414345933205 

0.2583399609602091 

0.2690928709229314 

0.2773915553451219 

0.2823859679690162 

0.28538903069701 

0.2868412883700927 

0.287613948144673 

0.2907468812442201 

0.291472581013134 

0.2936988723207048 

0.2930903087504995 

0.28921154185727 

0.2837902652892516 

0.2798529085677812 

0.2743527326374735 

0.2716180791304241 

0.2686048190242655 

0.2644383575705032 

0.2573735418579965 
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r 


0.2499992069220913 

0.239261 1 5498731 91 

0.2255701107673631 

0.2101984461280655 

0.19615099686665 

0.1803590836227403 

0.1610523246401646 

0.1429631271259921 

0.1273543183425618 

0.1120218861180346 

0.0960645013894233 

0.0803155899711188 

0.06300486342688363 

0.04279267174251583 

0.0231073738770337 


Initial  guess  for  ’x': 

10.35402435050376 

18.19077280899915 

21.71171356903935 

20.33521033480991 

14.82383446890182 

7.008352827725206 

-0.7972048465103275 

-6.5 

-8.83013611253079 

-7.656495855460165 

-3.930386807407886 

0.7071067811865453 

4.511178065381282 

6.212778468659391 
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5.401937904648413 

2.59807621135332 

•1.000246934522234 

•3.985346991432006 

•5.258443033643013 

-4.413527006719259 

-1.857217543664439 

1.36939385352089 

3.995271049498531 

5. 

3.995271049498533 
1 .369393853520896 
-1.857217543664434 
-4.41352700671926 
-5.258443033643003 
-3.985346991432011 
-1.000246934522246 
2.598076211353307 
5.401937904648421 
6.212778468659399 
4.511178065381289 
0.7071067811865533 
-3.930386807407885 
-7.656495855460135 
-8.8301361125308 
-6.499999999999997 
-0.7972048465103328 
7.008352827725192 
14.82383446890179 
20.33521033480992 
21.71171356903935 
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18.19077280699916 
10.3540243505038 
2.021 494950599223* 1 0M  4 
-10.35402435050376 
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